What is OpenSpeaks? What do you do?
OpenSpeaks is an initiative by the O Foundation (OFDN) to build open and collaborative resources for marginalized language communities and help them bring their languages to the digital space. We have been creating many Open Science resources like Open Educational Resources (OER), Open Source tools, open practices, and Open Data in addition to digital archives of languages. We are open by default and fully adhere to the philosophy of Openness.
Currently we have three major focus areas:
- A multimedia documentation toolkit – consists of OER and open source tools to conduct interviews and document languages
- A pronunciation toolkit — a workflow consisting OERs (including manuals), open source tools, datasets for creating a library of pronunciations
- An experimental text-to-speech — an alpha-level text-to-speech that uses the pronunciation library
There exists many tools and processes. Why can they not be used here?
There indeed exist many tools and processes—both proprietary and open—but they might not be fully usable for achieving what we are trying to achieve here. Our goal is to document native languages with a goal to create digital tools out of them. When those can be a great resource to help with “what to ask in an interview” they don’t always help with tools, techniques and the process that is used for recording. That part is probably their own trade secret simply because they charge for conducting the sessions. It is not an open and reproducible process and is heavily staff-dependent. When it’s true that the staff is professionally trained and the work that they would produce is going to be of really high quality, it might not be a sustainable practice because not all the communities can afford to spend money even though they want their languages to be documented. Communities need to be enabled with digital tools and provided with education so that they can document their own languages.This process is pro-community empowerment by creating open source tools, open educational resources and datasets, and open practices, and promote it widely so that anyone can translate the content and use it for their own language. One of the reasons to host it on a wiki is also to allow integration of Mediawiki’s collaborative translation processes.
What about using existing and easily available tools and platforms like the Hangouts On-Air?
That and many other tools can be and will be used. In fact, it is important to keep in mind that many indigenous people might not have good access to a fast Internet and they can record offline. In cases where it is not feasible to travel and have the community’s time, one can even record remotely (the process is documented here and the output of this experiment is available here). The idea is not to restrict everything into one solution but to allow diverse tools and processes as not one process will fit into all situations.In this toolkit, the focus is really high to maintain good audio quality because it is still fine if there is low quality video or no video at all. But if there are issues with the audio then the audio recordings can never be used to create speech synthesis tools like text-to-speech. In fact, the 30-40 mins interview is designed keeping in mind that the sole purpose is collecting diverse kinds speech of different emotional levels and different content so that there is a good amount of vocabulary, intonation and accent.
Who are the other major players that are also working towards documenting indigenous/endangered languages?
There have been some ground-breaking language preservation initiatives to preserve native languages in digital forms. Some of the notable ones are Living Tongues Institute for Endangered Languages, Open Language Archives Community (OLAC), Endangered Languages Archive (ELAR) at the SOAS University of London, TVMalintzi (YouTube-based television for Nahuatl), advocacy collectives like Rising Voices at Global Voices, several research and journalistic organizations across the US, Global Oneness Project, Wikitongues (communities documentation of video-narratives in native languages), National Endowment For The Humanities & Native Americans, open source language learning app Openwords. Digital Language Diversity Project (DLDP) (OERs for language preservation), Treasure Language Storytelling and a group of language preservation projects by linguist Dr. Steven Bird, StoryCorps for recording storytelling, ourselves at OpenSpeaks, and many more. However, considering the rate of language extinction, the volume of work is enormous and the number of individuals/organizations is way too less.