LINGUIST List 18.1455|
Mon May 14 2007
Diss: Computational Ling/Text&Corpus Ling/Translation: Chandra: 'Ma...'
Editor for this issue: Hunter Lockwood
To post to LINGUIST, use our convenient web form at
Machine Recognition and Morphological Analysis of Subanta-Padas
Message 1: Machine Recognition and Morphological Analysis of Subanta-Padas
From: Subhash Chandra <subhash.jnugmail.com>
Subject: Machine Recognition and Morphological Analysis of Subanta-Padas
Institution: Jawaharlal Nehru University, New Delhi
Program: Special Centre for Sanskrit Studies (SCSS)
Dissertation Status: Completed
Degree Date: 2006
Author: Subhash Chandra
Dissertation Title: Machine Recognition and Morphological Analysis of
Linguistic Field(s): Computational Linguistics
Subject Language(s): Sanskrit (san)
Girish Nath Jha
The Indian Heritage Group of the Centre for Development of Advanced
Computing (CDAC) has developed a system called DESIKA, which claims to
process all the words of Sanskrit and includes generation and analysis
(parsing). The Rashtriya Sanskrit Vidyapeeth, Tirupathi under the
leadership of Prof. K. V. Ramakrishnamacharyulu (currently Vice Chancellor
of Rajasthan Sanskrit University) has done commendable work on the
Sansk-net project. Prof. Vineet Chaitanya and Amba Kulkarni are visiting
the institution and are currently guiding several Sanskrit R&D initiatives
with far reaching consequences.
The Academy of Sanskrit Research, Melkote, Mysore has been actively
involved in bringing scholars doing technology R&D for Sanskrit and
shAstras on a single platform.
The Special Centre for Sanskrit Studies, Jawaharlal Nehru University, New
Delhi is currently engaged in the following R&D - kAraka Analyzer, sandhi
splitter and analyzer, verb analyzer, NP gender agreement, POS tagging of
Sanskrit, online Multilingual amarakoaha, Panni's AshTadhyAyI search
engine, online MahAbhArata indexing and Jha (2006) presented a model of
Sanskrit Analysis System (SAS). The RCILTS project under Prof. G.V. Singh
at the School of Computer and Systems Sciences has prepared useful
linguistic resources for Sanskrit.
Morphological analyzers for Sanskrit, Telugu, Hindi, Marathi, Kannada and
Punjabi have been developed by Akshara Bharathi Group at Indian Institute
of Technology, Kanpur, and University of Hyderabad funded by Ministry of
Information Technology the project claims to have 95% coverage for Telugu
(arbitrary text in modern standard Telugu), and 88% coverage for Hindi.
This system is available on the site for downloading as well as online at:
Anusaaraka (developed by Akshar Bharati group, IIIT, Hyderabad) is a
computer software which renders text from one Indian language into another,
a sort of machine translation. It produces output which is comprehensible
to the reader, although at times it might not be grammatical. The system is
available at the IIIT Hyderabad site )
How is this work different?
The work is different from existing research in the following ways:
1. No online RDBMS based recognizer-analyzer is available till date, which
accepts and displays results in Unicode Devanagari script but this system
takes Unicode Devanagri text and displays results in Devanagari,
2. This system takes Devanagari utf-8 text as input and delivers Devanagari
utf-8 text output using a Java servlet Apache-Tomcat - JDBC - RDBMS technology,
3. gives a comprehensive computational analysis of subanta-padas in a
Sanskrit text, and does basic tagging of verbs and avyayas too,
4. uses a hybrid approach to process input text. It works on the
morphological nature of bases and applies the vibhakti information for
5. the system can be used for larger processing of Sanskrit for text
simplification and machine translation
Summary of chapters
Chapter I discusses morphological analyzers, current status of R&D in this
field, structure and organization of of AshTAdhyAyI (AD), and subanta of
Chapter II discusses subanta formalism of Panini and mechanisms to
recognize verb, avyaya and subanta in Sanskrit text.
Chapter III discusses the analysis of subanta-padas.
Chapter IV discusses the implementation aspects: the front end, Java
objects, databases, linguistic resources (corpus and rule bases and example
bases), how they work and what is basic requirement of the system and how
to apply sandhi and subanta rule where ever necessary.
Conclusion discusses future R&D, limitations of the system and result analysis.
Respond to list|Read more issues|LINGUIST home page|Top of issue
Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.