Languages are a gateway to knowledge. How can digital tools be used to help native language speakers access and contribute knowledge? In this blog, Subhashish Panigrahi shows how endangered languages can be documented and preserved using open standards and tools.
The world’s knowledge that have been accumulated and coded over ages in different languages are valuable to learn about others’ cultures, traditions, and everything about their life. But not every language is not privileged to be a language of knowledge and governance.
Almost half of the 6909 living languages of the world will be vanishing in a century’s time. The most linguistically diverse places like Papua New Guinea are also the most dangerous places for languages. Every two weeks, a language dies and with it a wealth of knowledge forever. In my home country India alone, there exist more than 780 languages. The rate in which languages are dying here is extremely high as over 220 languages from India have died in the last 50 years, and 197 languages from the country are identified as endangered by UNESCO.
With these languages dying, there die all that knowledge that is preserved in those languages.
Languages that do not have tools for everyone to access knowledge and contribute to often go out of use. India for example is home to the highest number of visually impaired and illiterate people in the entire world: more than 15 million Indians are visually impaired and 30% are illiterate. But there do not exist many digital accessibility tools either for web or mobile, even though there are about 450-465 million internet users and 60% of them are mobile users. In fact, accessibility tools for most Indian languages are not affordable and are proprietary in nature.
There have been some efforts by the Indian government—like the Central Institute of Indian Languages (CIIL)—to grow the 22 officially recognized languages and some of indigenous languages. Founded in 1969, CIIL has been working to deepen research on Indian languages, and a program called “Protection and Preservation of Endangered Languages of India” was introduced in 2014 to help CIIL specifically to begin several projects for the conservation of endangered languages.
Only 10-30% of India’s population can understand English, which is predominantly the language of the Internet. A recent report that was published by Google and KPMG states that more than 70% of the India’s Internet users trust content in their native language over English. The lack of native language content and the lack of electronic accessibility tools therefore plays an important factor in stopping a large number of people from accessing information and contributing to the knowledge commons.
When confronted with a problem of this magnitude, there are a few vital things that must be to done to preserve and grow dying languages. Creation of audio-visual documentation of some of the most important socio-cultural aspects of the language such as storytelling, folk literature, oral culture and history is a start. When done by native language speakers, along with annotations of the same in done in a widely-spoken language such as English or Hindi, it is one way of creating digital resources in a language. These resources can be used to create content and linguistic tools to grow the languages’ reach.
Sadly, there is little focus from the central government on many of these languages, but there are some effort from several organisations to document native languages.
There is something every single individual that speaks a less-spoken language or is in contact with a native speaker of an endangered/indigenous language can do. Languages that are dying need digital activism to grow educational and accessibility tools.That can happen when more public and open repositories like dictionaries, pronunciation libraries, and audio-visual content are created.
However, not many people know how to contribute in a form that can used by others to grow resources in a language. Especially in India, contributing to a language is largely skewed by the notion of producing and promoting literature. But in a country where more than 30% of the population is illiterate and a large number of languages are spoken languages (without a written counterpart), it is important that the language content is predominantly audio-visual and not just text-based. More importantly, there is a need for openness so that the whole idea of growing languages does not get jeopardized by proprietary methods and standards.
There are plenty of things anyone can contribute for documenting a language depending on their own skillset.
Every language has a wealth of oral literature, which is the most crucial thing to document for a dying language. Several cultural aspects like folk storytelling, folk songs, other narratives like cooking, local festival celebration, performing art forms and so on can be documented in audio-visual forms.
Thanks to cheaper smartphones and an ocean of free and open source software, anyone can now record audio, take pictures and shoot videos in really good quality without spending anything on gears. There are open toolkits that aggregate open source tools, educational resources and sample datasets that one can modify and use for their own language.
In the age of AI and IoT, one can indeed build resources that will enable their languages to be more user friendly. As explained earlier, most screen reader software that the visually impaired or illiterate people would use do not exist because of the lack of good quality text-to-speech engines. Creating pronunciation libraries of words in a language can help a lot in building both text-to-speech and speech to text engines that eventually can better the screen readers and other electronic accessibility solutions. Cross-language open source tools like LinguaLibre, Kathabhidhana, and Pronuncify help record large number of pronunciations. Similarly, for languages with an alphabet, educational resources for language learning can be created with open source tools like Poly and OpenWords.
Building these resources might not result in transforming the state of many endangered languages quickly but will certainly help in gradually bettering the way many people access knowledge in their language.
The work of some of the groundbreaking initiatives like the Global Language Hotspots by the Living Tongues Institute for Endangered Languages and National Geographic can be used to start language documentation projects. But it is always recommended to make the work output available with open standards so that others can build solutions on the top of existing interventions.
However, there is not much about the actual outcome of any government-led activities for endangered language documentations, and especially if there is any open access to the published works. “People’s Linguistic Survey of India” (PLSI), a non-government-led survey was being conducted during 2012-13 in the leadership of Ganesh Devy.
A few years back, Gregory Anderson, founder of Living Tongues, and Prof. K. David Harrison, associate professor of Swarthmore College in Pennsylvania, US discovered a hidden language called Koro spoken in Arunanchal Pradesh. In 2014, Marie Wilcox, the last living speaker Wukchumni, a North American language, created a dictionary to keep her language alive. Imagine, where these languages would have ended up if Anderson and Harrison, and Marie did not take these baby steps back then.