New Tamil search engine
There is good
news for those who think the Internet is only for people who know English: The
Anna University-K B Chandrasekar Foundation (AU-KBC) has developed a search engine for Tamil websites and a tool which would translate
English content into Tamil.
To know more about this I met Prof C N Krishnan, director, AU-KBC, at Madras Institute of Technology (MIT),
Chromepet.
Some excerpts from his interview...
Please tell us about the Tamil search engine and how the project began?
The Tamil search engine project was started in 2001. The objective is the same as that of Google search engine. Ours search engine also will search for websites but the main difference is that our search engine searches only
Tamil websites. The search engine is for Tamil language alone.
Each language has some encoding, English-ASCII. For Tamil, there are more than 50 encoding and the international convention has introduced 'Unicode' coding standard. The idea is that all languages would follow the same encoding.
But during the 1990s, many people started new
encoding's. Google, which I mentioned earlier, searches only for sites following Unicode. Today's Tamil websites do not follow Unicode pattern. There are a lot of coding patterns such as TAB, TAM, TSCII and many others.
Our search engine is capable of handling all types of encoding. If you enter a search in Google Tamil, you may receive only one website, because that searches only websites following Unicode, but our search engines will fetch you more than 100 websites, because our search engine fetches content from all websites, irrespective of their encoding.
AU-KBC is the first in India to produce a search engine of this type, which searches websites of all encoding. Ours is a sophisticated and updated search engine. We have used it in our website for people to test it. It has already been tested by us.
This won't search English websites, because it is customised to search only Tamil websites. There are more than 4,000 to 5,000 Tamil sites - which is unique for a language site in India.
There is no serious content as we have in English. Only some websites have current information while many websites have content on Tamil literature and many other things.
Trials have been carried out in some portals with the search engine. The search engine has been available for more than a year on the AU-KBC website.
Irrespective of what type of keyboard one uses, the search engine would translate that internally. It will accept more than 25 encoding at present, nearly 90 per cent of the keyboard styles the engine covers.
If your system has TAB encoding and you use the same encoding for searching, our in-built tool would change it into the website's encoding and start the search and return the result in your TAB encoding.
What about the news aggregator tool that you have developed?
Sify has been using that in four languages, Tamil, Malayalam, Hindi and Telugu.
Can you explain to us the tool which translates English classified advertisement into Tamil?
Our tool would search the database of matrimonials in 'The Hindu'. The speciality is that you need not enter the text in English. You can enter the text criteria in Tamil and the tool would convert the text into English and then search the database and return it to you in Tamil after translating it back from English to Tamil.
Our Tamil search engine is deployed there and machine translation technology is also used. The commercialisation of the tool would be done soon. We have been testing it for the past three months and it has been working well.
We now want to use this for real estate type of advertisements. Irrespective of what type of keyboard one uses, the search engine would translate that internally. It will accept more than 25 encoding at present, nearly 90 per cent of the keyboard styles the engine covers.
If your system has TAB encoding and you use the same encoding for searching, our in-built tool would change it into the website's encoding and start the search and return the result back in your own TAB encoding.
How many such products do you have and what are your plans for deploying them?
A set of products, technologies and know-how has been developed and tested at the centre and is ready for commercial use:
1. 'Web Guard' - a hardware solution for protecting websites against defacement
2. Public Key Infrastructure (PKI) for secure transactions over enterprise networks
3. Dynamic Password Authentication Token (DPA) for roaming secure access over the Internet
4. Industrial pattern recognition system for recognising faces and objects
5. Indian language content aggregation and management tools for online news portals
6. Multilingual search engine for Indian languages
7. Machine translation systems involving Tamil, Tamil-Hindi and English-Tamil
8. Translingual information accessor for newspaper classified advertisements
9. Processing tools and resources for Indian languages, especially Tamil
10. Proprietary W-LAN (WiFi) security solution for secure wireless networking
11. W-LAN (WiFi) analysis and design tools
12. Nano scale imaging technologies for online industrial inspection
Is there any funding by the Government of Tamil Nadu or any other institution for your projects?
The Tamil Nadu Virtual University (TVU) gave Rs 1 lakh for the search engine. We did not approach the Tamil Nadu government for funds because it needs to spend more on school education.
We have made a Tamil wordnet jointly with the Thanjavur Tamil University.
Can you please tell us about the K B Chandrasekar Foundation and the funding for it?
This AU-KBC Foundation is supported by our alumni K B Chandrasekar, who belongs to the 1983 batch of MIT. The centre was started five years ago.
K B Chandrasekar has given about Rs 7 crore in the last five years. With that money we have set up the building, labs and other research facilities at MIT, Chromepet. All the staff in the foundation are employed by K B Chandrasekar Foundation and I am the only person from the university.
The land for the foundation building was given by Anna University and built with funding by Chandrasekar. The foundation pays a percentage to the university whenever we get some projects.
Anna University (AU) and K B Chandrasekar Research Foundation (KBCRF) Pvt Ltd signed a commercialisation agreement for marketing the products developed by AU-KBCRF.
The agreement was signed by the Registrar, Anna University, and one of the directors of KBCRF Pvt Ltd in the presence of
Dr E Balagurusamy, Vice-Chancellor, Anna University, and K B Chandrasekar, founder, AU-KBCRF, at Anna University, Chennai.
KBC's funding is like a seed fund. Now we are becoming self-sufficient: we are nearly 80 per cent self-sufficient. Now we have a requirement of Rs 70 lakh for the expenditure of the centre, not including the research funding.
We now generate nearly 80 per cent of the budget and the remaining 20 per cent is got from KBC. We have spent Rs 70 lakh on equipment. The rest are capital, recurring requirements -for are nearly Rs 70 lakh per year. We must have spent Rs 1.90 crore on capital. Of the Rs 70 lakh, we generate nearly 80 per cent on our own. We have an understanding with K B Chandrasekhar that within five years we will be able to sustain ourselves through our own funds.
We are the first in the country to be a self-supporting research centre.
What are your plans?
We have a major activity in biology - we are the only group in the country dealing with Nano Biology. We have nearly Rs 2.5 crore in funds from the Government of India for this project.
We are trying to analyse how a drug works for a particular type of disease on one single molecule. The Defence Department has found out that garlic eliminates cardiovascular problems at high altitudes. They have asked us to investigate the mechanism of how garlic does this.
President
A P J Abdul Kalam gave us a project - to build an English-Tamil translation system. The content should be in English and the system should translate it into Tamil for the user. The basic idea is that even users who do not know English can browse the Internet. It is a very difficult job, but we want to achieve it.
For more information, please mail @
info4all@au-kbc.org
N Arun Kumar