Using IT to unravel ancient wisdom contained in manuscripts
M A Siraj, April 3, 2013: 22:54 IST
Much of India’s traditional wisdom lies shrouded in classical languages and in manuscripts that are turning brittle and are threatened with extinction. Most of such manuscripts are owned privately with owners being secretive and possessive. Modest estimates put the number of these manuscripts at around one million.
But Peter Scharf, professor of Sanskrit at Brown University in the US, says their number may range from one million to five million. According to Dr N V Ramachandran, director, Asian Classics Input Project, Palakkad, oldest known palm-leaf manuscript was of the 6th century while the oldest paper manuscript belongs to 10th century. B Krishnamurthy, director (strategy), Vyoma Linguistic Labs Foundation, Bangalore, says his search for Sanskrit manuscripts on the Internet has revealed 434 sources of the extant manuscripts in India.
With IT revolution providing accessibility to the remotest recesses of knowledge sources, there is huge demand to bring all the traditional Indian knowledge, principally in Sanskrit, but considerable portion in other classical languages too, into public domain. The NDA government led by Atal Behari Vajpayee took the first visionary step in this direction by setting up the Traditional Knowledge Digital Library (TKDL) in 2002. Since then, some headway has been made in transcribing some manuscripts, mainly belonging to Ayurveda, Unani and Siddha medicinal system. But much remains to be done.
Dr Girish Jha, of Jawaharlal University says these manuscripts could have greater relevance in three important modern sectors, namely Vimana Shastra (Aeronautics), Metallurgy and Cosmology. As far as food and medicine are concerned, their relevance has been recognised since antiquity. It was only on this basis that the Indian scientists fought against overseas patenting of turmeric and were instrumental in returning the credit to India. In fact, CFTRI, Mysore headed by its former chairman Dr V Prakash was able to expedite patenting of hundreds of Indian foods, treatments and herbal medicine during the last 15 years on the basis of manuscripts. Says Dr. Darshan Shankar, Vice Chairman, Institute of Ayurvedic & Integrative Medicine, Bangalore, a rough guess reveals that there were more than 50,000 manuscripts pertaining to Ayurveda lying untapped without being catalogued.
Information Technology with its myriad software does provide an answer to the challenges in transferring the concealed knowledge to the public domain. Prof Scharf says there is an urgent need to focus on using IT for creating a worldwide network of data bases of ancient Indian manuscripts such that anyone, in any part of the world, could easily access any word, phrase, sentence or statement from any digitised manuscript housed anywhere in the world. He envisages a system where manuscript owners whether individuals or institutions, could produce online catalogues of manuscripts held by them using any of the open source cataloguing software currently available and upload onto this worldwide network.
A 3-day national seminar on ‘Application of Information Technology for conservation, editing and publication of manuscripts’ held recently in Bangalore called for creating a distributed platform of IT in relation to manuscripts using inter-operability protocols without exercising any control over participating individuals and institutions. According to Prof M A Lakshmithathachar, founder chairman of the Academy of Sanskrit Research, Melkote, digitising a single manuscript might entail two years of a techno-savvy Sanskrit scholar.
The country cannot afford this and will have to look for alternative strategies. He suggests development of ‘speech to text software’, a machine readable text, which would bring down the time, energy and cost by 80 per cent. Prof Thathachar says that since Sanskrit had retained the uniform phonetic intonation remarkably well through millennia and through vedic recitations under Gurukula system, it raises hopes of success of ‘speech to text software’ enormously. However, the process would not be hassle-free as manuscripts have followed varied alphabets, modes and style of writing during different centuries.
Some experts also emphasise induction of the Optical Character Recognition (OCR) for the digitisation project. Prof A G Ramakrishna of Indian Institute of Science, who has developed OCR software for Tamil, says that his team has been able to digitise 200 books in Tamil using this software. They were also working on Tamil/Kannada TTS (Text to Speech) Software. However, it can deal with text alone, not the pictures and pictures will have to be removed before using OCR. He recommends development of a good Devanagiri based OCR software which should include all the rare characters, diacritical marks and augment it with a phoneme based speech recognition system.
Dr P Ramanujan, assistant director C-DAC points out lacunae in speech to text software. He says key elements present in oral teaching such as correct pronunciation, intonations, ‘bhava’ etc are not found in the print medium and these could be compensated by e-learning where voice, visuals etc can also be added. Ramanujan has been instrumental in developing Unicode Manuscript Editor at C-DAC enabling comparative analysis of various manuscripts.
Work load could be drastically minimised as various versions of the same manuscript are found in various libraries. He cites the example of a particular text of which he was able to gather 35 versions from different sources including four version of the same text from one library alone. C-DAC has put about 15,000 images pertaining to 100 manuscripts together with the manuscript editor on the website www.parankusa.org which has several Vedic texts with exhaustive commentaries and hyperlinks. But there are several milestones to be reached.
Girish Jha says following resolution of issues with the OCR the work needs to progress on text readers, search engines, inter-linking of sources of data, translation software and pronunciation analysers etc before a real breakthrough is made in exhaustive digitisation of ancient wisdom.