[Y-Indology] Identifying Verb Inflections in Sanskrit Morphology

March 24, 2005

Dear Frinds,

I am developing a system by which can identify sanskrit verbs

through reverse, for this I wrote a paper for feedback and present it SIMPLE05

(iit kharagpur). Kindly advise me to make it comperhensive and usefull. I am

sending a copy of this paper in following of this message. All sugestions are

welcome.

Deep regards.

Sudhir mishra

Identifying Verb Inflections in Sanskrit Morphology

Sudhir Kumar Mishra, Girish Nath Jha

Special Center for Sanskrit Studies,

J.N.U., New Delhi -67

mishra_skumar78

girishj

Abstract

The paper presents a model for Sanskrit verb inflection identification which

will correctly describe verbs in a laukika Sanskrit text . Verbs play an

important role in syntactico-semantic relations in the sentence. The overall

idea is to identify and analyze the Sanskrit verbs correctly so that any

Sanskrit to Indian language machine translation can benefit from this

processing. The overall model of the system is as follows-

VERB FORMS

#8595;

INFLECTION ID

#8595;

TAM ID

#8595;

PREFIX ID

#8595;

VERB SPLITTING

#8595;

VERB ID

1. Introduction

Sanskrit verb forms are very complex. They carry tense, aspect, person, number

information all in the inflection forms. Besides, they can also contain

derivations containing semantic informations like causation, desire, repitition,

negation etc. Therefore it becomes very difficult to split out the verb and

separate the verb root and complex information units encoded in it. Sanskrit

has about 2000 verb roots classified in 10 morphological and semantic classes

called gaöas, and can also be further sub-classified as normal forms (without

any of the 12 derivational affixes – 11 listed by P#257;öini [P 3.1.32], 1 more

‘kvip’ added by K#257;ty#257;yana), and the derived forms with öijanta

(causative – öic), sannata (expressing desire – san, kyac, k#257;myac, kvip,

kyaº, kya·,öiº, yak, #257;y and iyaº) , yaºanta (duplicated – yaº and

yaºluºanta). Further, these can have #257;tmane and parasmai forms in 10

lak#257;ras and 3 x 3 person and number combinations, and can also be

potentially

prefixed with 22 prefixes. Finally there could be in-numerable

n#257;madh#257;tus (nominalized verbs). We have made a rough calculation of all

potential verb forms in Sanskrit to be around 10,29,60,000 plus

n#257;madh#257;tus. The distribution of Sanskrit verbs can be understood as

follows-

VR [2000]

#9500; san

#9500; kyac

#9500; k#257;myac

#9500; kvip

#9500; kyaº

#9500; kya·

#9500; öiº

#9500; öic

#9500; yaº

#9500; yak

#9500; #257;y

#9492; iyaº

+ one normal form

#8595;

TAM [10 lak#257;ras]

#8595;

#9484;#9472;#9472;#9472;#9472;#9472;#9472;#9472;#9472;#9472;#9472;#9472;#9472;#9\

472;#9472;#9472;#9488;

parasmai #257;tmane

#8595; #8595;

10x9 forms 10x9 forms

#8595; #8595;

22 upasarga 22 upasarga

Therefore the approach followed by many to store Sanskrit verb forms is not

going to work. Hence we propose the P#257;öinian approach in reverse for parsing

the complex verb forms in Sanskrit in the following sequence –

· the verb inflection (parasmai / #257;tmane) is identified from

database and a rough guess is made about the verb,

· the lak#257;ra information based on inflection is obtained,

· each of the 12 derivational affixes as evaluated,

· each of the 22 upasarga (prefixes) is searched,

· the verb root is determined by weeding out other elements and database

matching.

The verb root thus identified with all the other potential components and the

grammatical information gathered in the form of verb tags can be potentially

used in a machine translation system performing translation from Sanskrit.

2. Previous work

Though a comprehensive analysis of Sanskrit verbs for machine is still awaited,

the following work, mostly under government funding, can be mentioned as

relevant in this context -

· the Prajna project of ASR Melkote [1] claims to do module generation

and analysis of 400 important Sanskrit roots in three voices (Active, Passive

and Impersonal), 10 lak#140;ra, 6 tense and 4 moods,

· Aiba (2004) [2] claims to have developed a Verb Analyzer for

classical Sanskrit which can parse Sanskrit verb in Present, Aorist, Perfect,

Future, Passive and Causative forms. This site actually works only for some

verbs and accepts that the results are not reliable,

· Desika project of TDIL, Govt. of India [3] claims to be an NLU system

for generation and analysis for plain and accented written Sanskrit texts based

on grammar rules of P#140;öini's A·#140;dhy#140;y´. It also claims to have

a database based on Amarako§a and heuristics based on Ny#140;ya &

M´m#140;µs#140; §#140;stras and claims to analyze Vedic texts as well,

· RCILTS project at SC&SS, JNU [4] has reportedly stored all verb forms

of Sanskrit in a database,

· §#140;bdabodha project of TDIL, govt. of India [5] claims to be an

interactive application to analyze the semantic and syntactic structure of

Sanskrit sentences,

· The ASR Melcote website reports that a Sanskrit Authoring System is

under development at C-DAC Bangalore. The system is supposed to make making

tools for morphological, syntactic and semantic analyses with word split

programs for sandhi and sam#257;sa.[1].

· Cardona (2004) [6] discussed P#257;öini’s derivational system

involving aspect of linguistics, grammar and computer science.

· Whitney(2002) [7] listed all the quotable roots of the Sanskrit

language together with the tense and the conjugation system.

· Mishra and Jha (2004) [8] describe a module (Sanskrit K#140;raka

Analyzer) for identification and description of k#257;raka according to

P#140;öinian k#140;raka formulations.

· Edgren (1885) [9] discussed verb roots of Sanskrit language according

to Sanskrit grammarians.

· Joshi (1962) [10] presents linguistic analysis of verb and nouns of

Sanskrit language.

The present work needs to be distinguished from the above mentioned work because

it presents identification of the verb by applying P#257;öini rules in reverse

with the help of a relational database as mentioned in section 1 (introduction).

This system can also be used to identify the types of sentences as active or

passive voice with complete reference of the verb.

3. Module description

P#140;öini verb formulations are very complex and involve a balanced interplay

of morphological and syntactic information. We propose the following modules for

Sanskrit verb analysis -

· TGL id (verb inflection, class and TAM id module),

· Verb derivation id,

· prefix ID module.

3.1 3.1 TGL id module

This module will identify the verb inflections (tiº), class (gaöa) and TAM

(lak#140;ra) based on a database structure. P#257;öini uses 18 verb inflection

suffixes called tiº (P 3.4.78) (9: 3x3 for parasmai and 9: 3x3 for #140;tmane)

for verbs in different gaöas (bhv#140;di, ad#140;di, juhoty#140;di, div#140;di,

sv#140;di, tud#140;di, rudh#140;di, tan#140;di, kry#140;di and cur#140;di) and

tense (la , lo , laº, vidhiliº, l¨ , lu , #140;§#299;rliº, l¨º , li and luº).

All roots to conjugate may be in #140;tmanepa or parasmaipa or in both forms

(ubhayapad#299;). The database fragment for identifying the suffixes is as

follows-

suf_id

gaöa_id

tense_id

paras_id

#140;tmane_id

tip

1

ti

te

tip

1

9

a

e

Table 1

Some verb forms may return ambiguous results. For example, #8730;bhu (laº

lak#257;ra parasmaipada prathama puru·a bahuvacana) form is obtained by adding

‘jhi which changes to ‘an’ and in the #8730;bhu (s#257;m#257;nyabh#363;ta

luºlak#257;ra parasmaipada prathama puru·a bahuvacana ) form also ‘jhi

changes to ‘an’. All such cases are stored separately as shown in the following

table -

root

gaöa

tid

P

N

pada

form

bhu

1

3

1

3

P

abhavan

bhu

1

10

1

3

p

abhuvan

Table 2

This table will also be used for storing other irregular forms.

3.2 Verb Derivation ID

Since the verbs can have one of the 12 derivational suffixes under different

semantic conditions, it will be necessary to identify and isolate those

suffixes. We are storing all 2000 roots and their forms in 12 derivational

suffixes as shown below –

root

gaöa

san

öic

kyaº

kvip

bhu

1

bubh #363;·ati

bh#257;vayati

-

The problem of n#257;madh#257;tus will be handled by searching Monier William

Digital Dictionary [11] if the remaining root is not found in our basic verb

root database.

3.3 Prefix ID

There are 22 prefixes in Sanskrit which can be found at the beginning of the

verb-form strings. The database for identifying the prefixes are as follows-

no_id

pre_id

1

pra

2

pr#140;

3

apa

Table 3

Thus after passing through all the above modules, the verb form will be

correctly tagged for all morpho-syntactic information resulting in root and

affix analysis.

4. Sample illustration

The following examples illustrate the proposed processing of Sanskrit verbs in

reverse by applying P#257;öinian verb vyutpatti procedures -

Example1: ap#140;§ayat

Module 1 : ap#140;§aya- t : laº, curadi, ubhaya, a-prefixed as past

Module 2 : pa§-nic :caus of pa§

Module 3 : does not apply (no prefixes)

Result : {([a] laº [pa§]VR [aya] caus) [t]laº}

Example2: prabhavati

Module 1 : prabhava- it : la , bhv#257;di, parasmai,

Module 2 : does not apply

Module 3 : pra (prefixes)

Result : {([ pra] Pre [bhu]VR [ti]la }

Example3: pip¨cchi·ati

Module 1 : pip¨cchi·a- ti : la , tud#257;di, parasmai,

Module 2 : pip¨cch -i·a : sannant of pracch

Module 3 : does not apply (no prefixes)

Result : {([pracch]VR [isa] san)[ti] la }

5. Problems

The following problems/ difficulties are visualized –

· Some Sanskrit verbs can have ambiguous tags of both nouns and verbs.

For example, r#257;maH can be an inflected noun obtained from base r#257;ma or a

verb form obtained by adding gha#150; suffix to #8730;ramu. Such cases should

be ideally referred to an etymological parser and not a parser for morphological

parsing as the present system proposes. The proposed system will analyze the

actual usage (laukika) of the language as found in day to day language and

texts.

· Verb suffixes resulting in same forms for varying tense aspect

conditions (the example of #8730;bhu as mentioned in section 3.1 above) will

certainly pose a problem in disambiguation. The authors propose to list such

constructions separately as mentioned above in section 3.1.

· The case of using reported speech verb modifier ‘sma’ (which modifies

present forms for past use) will have to be handled separately. In this case,

the system will parse the verb as present tense morphologically and then add

the overall tag of past.

· Though we have given a solution for handling the n#257;madh#257;tus,

yet some of them may not be appropriately tagged because names are open ended

and dictionaries may not list all of them.

6. Conclusion

The scope of this system is to be able to do a comprehensive analysis of

Sanskrit verbs keeping its usefulness in POS tagging and MT from Sanskrit. This

is going to be an online system with front end in Java servlet and backend in a

relational database server. The system can be easily connected to any

Sanskrit-Indian Language MTS for correct analysis of the sourse sentences. This

proposal needs feedback and discussion to make it really comprehensive and

foolproof so that all kinds of Sanskrit verbs can be analyzed correctly.

7. References

1. Prajna system, ASR Melcote,

http://www.sanskritacademy.org/Achievements.htm

2. Aiba, Verb Analyzer for classical Sanskrit,

http://www-asia.human.is.tohoku.ac.jp/demo/vasia/html/

3. Desika, TDIL, Govt. of India,

http://tdil.mit.gov.in/download/Desika.htm

4. RCILTS, JNU, http://rcilts.jnu.ac.in

5. Shabdabodha, ASR, Melcote, http://vedavid.org/ASR/#anchor2

6. Cardona, George, 2004, ‘Some Questions on P#257;öini’s Derivational

system’ In SPLASH proc. of iSTRANS, pp. 3

7. Whitney, W.D., 2002, ‘History of Sanskrit grammar’ Sanjay prakashan,

Delhi.

8. Mishra, Sudhir K, Jha, Girish N, 2004, ‘Sanskrit K#257;raka Analyzer

for Machine Translation’ In SPLASH proc. of iSTRANS, pp. 224-225. New Delhi.

9. Edgren , A. H., 1885, ‘On the verbal roots of the Sanskrit language

and of the Sanskrit grammarians’ Journal of the Americal oriental Society 11:

1-55.

10. Joshi, S. D., 1962, ‘Verbs and nouns in Sanskrit’ Indian linguistics 32

: 60-63.

11. Bontes, Louis, 2001, Digital Dictionary Sanskrit to English of Monier

Williams.

India Matrimony: Find your life partneronline.

Sign In

[Y-Indology] Identifying Verb Inflections in Sanskrit Morphology

Recommended Posts

Guest guest

Link to comment

Share on other sites

Join the conversation

Support the Ashram

Join Groups

Top Downloads