Guest guest Posted March 24, 2005 Report Share Posted March 24, 2005 Dear Frinds, I am developing a system by which can identify sanskrit verbs through reverse, for this I wrote a paper for feedback and present it SIMPLE05 (iit kharagpur). Kindly advise me to make it comperhensive and usefull. I am sending a copy of this paper in following of this message. All sugestions are welcome. Deep regards. Sudhir mishra Identifying Verb Inflections in Sanskrit Morphology Sudhir Kumar Mishra, Girish Nath Jha Special Center for Sanskrit Studies, J.N.U., New Delhi -67 mishra_skumar78 girishj Abstract The paper presents a model for Sanskrit verb inflection identification which will correctly describe verbs in a laukika Sanskrit text . Verbs play an important role in syntactico-semantic relations in the sentence. The overall idea is to identify and analyze the Sanskrit verbs correctly so that any Sanskrit to Indian language machine translation can benefit from this processing. The overall model of the system is as follows- VERB FORMS #8595; INFLECTION ID #8595; TAM ID #8595; PREFIX ID #8595; VERB SPLITTING #8595; VERB ID 1. Introduction Sanskrit verb forms are very complex. They carry tense, aspect, person, number information all in the inflection forms. Besides, they can also contain derivations containing semantic informations like causation, desire, repitition, negation etc. Therefore it becomes very difficult to split out the verb and separate the verb root and complex information units encoded in it. Sanskrit has about 2000 verb roots classified in 10 morphological and semantic classes called gaöas, and can also be further sub-classified as normal forms (without any of the 12 derivational affixes – 11 listed by P#257;öini [P 3.1.32], 1 more ‘kvip’ added by K#257;ty#257;yana), and the derived forms with öijanta (causative – öic), sannata (expressing desire – san, kyac, k#257;myac, kvip, kyaº, kya·,öiº, yak, #257;y and iyaº) , yaºanta (duplicated – yaº and yaºluºanta). Further, these can have #257;tmane and parasmai forms in 10 lak#257;ras and 3 x 3 person and number combinations, and can also be potentially prefixed with 22 prefixes. Finally there could be in-numerable n#257;madh#257;tus (nominalized verbs). We have made a rough calculation of all potential verb forms in Sanskrit to be around 10,29,60,000 plus n#257;madh#257;tus. The distribution of Sanskrit verbs can be understood as follows- VR [2000] #9500; san #9500; kyac #9500; k#257;myac #9500; kvip #9500; kyaº #9500; kya· #9500; öiº #9500; öic #9500; yaº #9500; yak #9500; #257;y #9492; iyaº + one normal form #8595; TAM [10 lak#257;ras] #8595; #9484;#9472;#9472;#9472;#9472;#9472;#9472;#9472;#9472;#9472;#9472;#9472;#9472;#9\ 472;#9472;#9472;#9488; parasmai #257;tmane #8595; #8595; 10x9 forms 10x9 forms #8595; #8595; 22 upasarga 22 upasarga Therefore the approach followed by many to store Sanskrit verb forms is not going to work. Hence we propose the P#257;öinian approach in reverse for parsing the complex verb forms in Sanskrit in the following sequence – · the verb inflection (parasmai / #257;tmane) is identified from database and a rough guess is made about the verb, · the lak#257;ra information based on inflection is obtained, · each of the 12 derivational affixes as evaluated, · each of the 22 upasarga (prefixes) is searched, · the verb root is determined by weeding out other elements and database matching. The verb root thus identified with all the other potential components and the grammatical information gathered in the form of verb tags can be potentially used in a machine translation system performing translation from Sanskrit. 2. Previous work Though a comprehensive analysis of Sanskrit verbs for machine is still awaited, the following work, mostly under government funding, can be mentioned as relevant in this context - · the Prajna project of ASR Melkote [1] claims to do module generation and analysis of 400 important Sanskrit roots in three voices (Active, Passive and Impersonal), 10 lak#140;ra, 6 tense and 4 moods, · Aiba (2004) [2] claims to have developed a Verb Analyzer for classical Sanskrit which can parse Sanskrit verb in Present, Aorist, Perfect, Future, Passive and Causative forms. This site actually works only for some verbs and accepts that the results are not reliable, · Desika project of TDIL, Govt. of India [3] claims to be an NLU system for generation and analysis for plain and accented written Sanskrit texts based on grammar rules of P#140;öini's A·#140;dhy#140;y´. It also claims to have a database based on Amarako§a and heuristics based on Ny#140;ya & M´m#140;µs#140; §#140;stras and claims to analyze Vedic texts as well, · RCILTS project at SC&SS, JNU [4] has reportedly stored all verb forms of Sanskrit in a database, · §#140;bdabodha project of TDIL, govt. of India [5] claims to be an interactive application to analyze the semantic and syntactic structure of Sanskrit sentences, · The ASR Melcote website reports that a Sanskrit Authoring System is under development at C-DAC Bangalore. The system is supposed to make making tools for morphological, syntactic and semantic analyses with word split programs for sandhi and sam#257;sa.[1]. · Cardona (2004) [6] discussed P#257;öini’s derivational system involving aspect of linguistics, grammar and computer science. · Whitney(2002) [7] listed all the quotable roots of the Sanskrit language together with the tense and the conjugation system. · Mishra and Jha (2004) [8] describe a module (Sanskrit K#140;raka Analyzer) for identification and description of k#257;raka according to P#140;öinian k#140;raka formulations. · Edgren (1885) [9] discussed verb roots of Sanskrit language according to Sanskrit grammarians. · Joshi (1962) [10] presents linguistic analysis of verb and nouns of Sanskrit language. The present work needs to be distinguished from the above mentioned work because it presents identification of the verb by applying P#257;öini rules in reverse with the help of a relational database as mentioned in section 1 (introduction). This system can also be used to identify the types of sentences as active or passive voice with complete reference of the verb. 3. Module description P#140;öini verb formulations are very complex and involve a balanced interplay of morphological and syntactic information. We propose the following modules for Sanskrit verb analysis - · TGL id (verb inflection, class and TAM id module), · Verb derivation id, · prefix ID module. 3.1 3.1 TGL id module This module will identify the verb inflections (tiº), class (gaöa) and TAM (lak#140;ra) based on a database structure. P#257;öini uses 18 verb inflection suffixes called tiº (P 3.4.78) (9: 3x3 for parasmai and 9: 3x3 for #140;tmane) for verbs in different gaöas (bhv#140;di, ad#140;di, juhoty#140;di, div#140;di, sv#140;di, tud#140;di, rudh#140;di, tan#140;di, kry#140;di and cur#140;di) and tense (la , lo , laº, vidhiliº, l¨ , lu , #140;§#299;rliº, l¨º , li and luº). All roots to conjugate may be in #140;tmanepa or parasmaipa or in both forms (ubhayapad#299;). The database fragment for identifying the suffixes is as follows- suf_id gaöa_id tense_id paras_id #140;tmane_id tip 1 1 ti te tip 1 9 a e Table 1 Some verb forms may return ambiguous results. For example, #8730;bhu (laº lak#257;ra parasmaipada prathama puru·a bahuvacana) form is obtained by adding ‘jhi which changes to ‘an’ and in the #8730;bhu (s#257;m#257;nyabh#363;ta luºlak#257;ra parasmaipada prathama puru·a bahuvacana ) form also ‘jhi changes to ‘an’. All such cases are stored separately as shown in the following table - root gaöa tid P N pada form bhu 1 3 1 3 P abhavan bhu 1 10 1 3 p abhuvan Table 2 This table will also be used for storing other irregular forms. 3.2 Verb Derivation ID Since the verbs can have one of the 12 derivational suffixes under different semantic conditions, it will be necessary to identify and isolate those suffixes. We are storing all 2000 roots and their forms in 12 derivational suffixes as shown below – root gaöa san öic kyaº kvip bhu 1 bubh #363;·ati bh#257;vayati - - The problem of n#257;madh#257;tus will be handled by searching Monier William Digital Dictionary [11] if the remaining root is not found in our basic verb root database. 3.3 Prefix ID There are 22 prefixes in Sanskrit which can be found at the beginning of the verb-form strings. The database for identifying the prefixes are as follows- no_id pre_id 1 pra 2 pr#140; 3 apa Table 3 Thus after passing through all the above modules, the verb form will be correctly tagged for all morpho-syntactic information resulting in root and affix analysis. 4. Sample illustration The following examples illustrate the proposed processing of Sanskrit verbs in reverse by applying P#257;öinian verb vyutpatti procedures - Example1: ap#140;§ayat Module 1 : ap#140;§aya- t : laº, curadi, ubhaya, a-prefixed as past Module 2 : pa§-nic :caus of pa§ Module 3 : does not apply (no prefixes) Result : {([a] laº [pa§]VR [aya] caus) [t]laº} Example2: prabhavati Module 1 : prabhava- it : la , bhv#257;di, parasmai, Module 2 : does not apply Module 3 : pra (prefixes) Result : {([ pra] Pre [bhu]VR [ti]la } Example3: pip¨cchi·ati Module 1 : pip¨cchi·a- ti : la , tud#257;di, parasmai, Module 2 : pip¨cch -i·a : sannant of pracch Module 3 : does not apply (no prefixes) Result : {([pracch]VR [isa] san)[ti] la } 5. Problems The following problems/ difficulties are visualized – · Some Sanskrit verbs can have ambiguous tags of both nouns and verbs. For example, r#257;maH can be an inflected noun obtained from base r#257;ma or a verb form obtained by adding gha#150; suffix to #8730;ramu. Such cases should be ideally referred to an etymological parser and not a parser for morphological parsing as the present system proposes. The proposed system will analyze the actual usage (laukika) of the language as found in day to day language and texts. · Verb suffixes resulting in same forms for varying tense aspect conditions (the example of #8730;bhu as mentioned in section 3.1 above) will certainly pose a problem in disambiguation. The authors propose to list such constructions separately as mentioned above in section 3.1. · The case of using reported speech verb modifier ‘sma’ (which modifies present forms for past use) will have to be handled separately. In this case, the system will parse the verb as present tense morphologically and then add the overall tag of past. · Though we have given a solution for handling the n#257;madh#257;tus, yet some of them may not be appropriately tagged because names are open ended and dictionaries may not list all of them. 6. Conclusion The scope of this system is to be able to do a comprehensive analysis of Sanskrit verbs keeping its usefulness in POS tagging and MT from Sanskrit. This is going to be an online system with front end in Java servlet and backend in a relational database server. The system can be easily connected to any Sanskrit-Indian Language MTS for correct analysis of the sourse sentences. This proposal needs feedback and discussion to make it really comprehensive and foolproof so that all kinds of Sanskrit verbs can be analyzed correctly. 7. References 1. Prajna system, ASR Melcote, http://www.sanskritacademy.org/Achievements.htm 2. Aiba, Verb Analyzer for classical Sanskrit, http://www-asia.human.is.tohoku.ac.jp/demo/vasia/html/ 3. Desika, TDIL, Govt. of India, http://tdil.mit.gov.in/download/Desika.htm 4. RCILTS, JNU, http://rcilts.jnu.ac.in 5. Shabdabodha, ASR, Melcote, http://vedavid.org/ASR/#anchor2 6. Cardona, George, 2004, ‘Some Questions on P#257;öini’s Derivational system’ In SPLASH proc. of iSTRANS, pp. 3 7. Whitney, W.D., 2002, ‘History of Sanskrit grammar’ Sanjay prakashan, Delhi. 8. Mishra, Sudhir K, Jha, Girish N, 2004, ‘Sanskrit K#257;raka Analyzer for Machine Translation’ In SPLASH proc. of iSTRANS, pp. 224-225. New Delhi. 9. Edgren , A. H., 1885, ‘On the verbal roots of the Sanskrit language and of the Sanskrit grammarians’ Journal of the Americal oriental Society 11: 1-55. 10. Joshi, S. D., 1962, ‘Verbs and nouns in Sanskrit’ Indian linguistics 32 : 60-63. 11. Bontes, Louis, 2001, Digital Dictionary Sanskrit to English of Monier Williams. India Matrimony: Find your life partneronline. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You are posting as a guest. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.