Guest guest Posted August 7, 2008 Report Share Posted August 7, 2008 Dear list members, Now that we're all using Kyoto-Harvard transliteration perhaps its time to look at another aspect of Sanskrit encoding: Not so much on how best to communicate Sanskrit between human-beings but how best to encode Sanskrit for computer processing, analysis and study. An encoding of a language to work ideally with computers needs to have two characteristics. 1) One letter of the encoding needs to correspond to one letter of the language. 2) The encoding needs to sort naturally on the languages alphabetic order. I.e. in an English font a is encoded as 97, b as 98, c as 99 etc. so it naturally sorts as a,b,c etc. None of the encodings in popular use today, not even Unicode, have these characteristics and this complicates computer processing of Sanskrit. Though I do see that an encoding SLP1 does have a 1 to 1 correspondence between letters of the encoding and letters of Sanskrit. To give a very simple example, suppose you had a database of catalog records of sanskrit texts and their authors and you wanted to be able to print out the texts sorted by either author or by title in correct Sanskrit letter order then you would need to have the author and the title in the database twice, once in an encoding corresponding to a font to display it and once in an encoding ordered in the correct Sanskrit alphabetic sequence to sort on. Or else you'd have program some special sorting routines in your program. Or another simple example: suppose you have a word and you want to find all words in a text that differ from it by one letter. You use what are called " regular expressions " . For example in English to find all cases of g-ve (i.e. give, gave etc.) you would input to the computer program the simple phrase |g.ve| and that would return all cases of give, gave etc. but to do the same thing in Sanskrit is more complicated because ai and au represent only one letter in Sanskrit. For example |v.rya| would return virya but not vairya because to your computer program ai is two letters. But |v..rya| would return vairya but not virya. Similarly v(i|ai)rya would return virya and vairya but miss all misspellings of the word, so even in unicode you need to put in the more complicated v(.|ai|au|)rya to get all correct or misspelled cases. These are of course very simple examples that can easily be gotten round but they do illustrate in a simple way the complications that the present encodings of Sanskrit present. At present we have probably only a few thousand digitized Sanskrit e-texts but the way things are going its quite possible in a few years time that there may be a huge corpus of digitized Sanskrit texts available for computer analysis of all kinds, perhaps analysis of subtle differences in word counts and usage to help identify whether an attributed author is in fact the author, analysis to see if a text was written by one author or many etc. etc., computerized finding of parallel passages, computerized help in creating concordances, etc. All of this is much easier if you have an encoding with those two characteristics mentioned earlier, i.e.: 1) one to one correspondence between a letter of the encoding and a letter of the language. 2) An encoding that collates (i.e. sorts naturally) in the sort order of the languages letters. Regards, Harry Spier Quote Link to comment Share on other sites More sharing options...
Guest guest Posted August 20, 2008 Report Share Posted August 20, 2008 Till the ideal encoding is invented and becomes prevalent, it would be nice if any Sanskrit text would have a standard marker that identifies encoding used. This way computer programs would have at least a chance to adapt to various encodings that are less than perfect. There are two steps to this goal. 1.Create a list of abbreviations for all electronically used encodings and 2. update existing files with a marker, using corresponding abbreviation. For example: 1. HK for Harvard-Kyoto CSX for CSX UTF8 for Unicode-8-bit REE for REE ... etc. 2. let the marker be something standard and simple like %%##skt-encoding=HK##%% somewhere at the beginning of a document. I think that creating abbreviations for existing encodings and the choice of a marker might be accomplished on this forum. Regards, Dmitri. Quote Link to comment Share on other sites More sharing options...
Guest guest Posted August 31, 2008 Report Share Posted August 31, 2008 Re: Computer representation of Sanskrit I find that the best computer representation of Sanskrit, Hindi, and a few other Indian languages is " Sanskrit 2003 " at least for IBM machines. I am teaching myself basic Hindi. I already know the devanagari script for reading Hindi and Sanskrit. I have a basic vocabulary of 25 basic expressions already in Hindi. I intend to one day know fluent Hindi and Sanskrit. I plan to travel to India and study in an ashram for a couple of months studying the vedic scriptures and meditating. Ashok Aklujkar and his wife and daughter (Rasika) I believe were the ones to encourage me to learn Hindi. I really enjoyed the Butterfly and Bee dance story very much in Surrey during the BCACL 2008 conference. It really opened my eyes to Indian and Hindu culture. All the best, Lyle Lexier 604-408-9469 lord_moa ________________ Canada Toolbar: Search from anywhere on the web, and bookmark your favourite sites. Download it now at http://ca.toolbar.. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You are posting as a guest. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.