Multimedia toolkit

When languages die, they take away the knowledge preserved in them. At least one language is dying in every second week. Think about the indigenous culture, and cuisine, and weaving techniques, the unique soothing music, the dance forms, and many more that have only been ascribed in a particular language—they are too valuable to lose. We can create a lot with AI, but who does not want to play a digital game or even a board game that is recreated from an indigenous game-play?It’s a great benefit to live in an era that has such powerful digital tools to document and grow languages for many generations that are yet to come. This is probably the right time to think how we can take the advantage of openness—that contains the open source software, the open educational resource, the open processes and communities, and a diverse range of outcomes in open standards—to transform the state of many endangered languages.

Workflow

OpenSpeaks workflow.svg1. Mapping the status quo of the endangered languages of India in mostly but not limited to the following areas that affect the growth of languages:

  • state language policy
  • native language education and state literacy
  • media, internet and mobile penetration
  • (digital) tools to access and contribute knowledge
  • electronic accessibility e.g. availability of screen reader, text-to-speech and speech-to-text, electronic accessibility tools in public services like ATMs, bus stations, smartphones
  • open-licensed resources like corpus and audio libraries
  • available linguistic tools for machine learning and Natural Language Processing
  • organizations working for the development of the endangered languages

2. Identifying
 demographic 
zones 
in need of 
immediate 
intervention based on the mapping research. A great inspiration can be the “Language Hotspot” model created by the Living Tongues Institute for Endangered Languages that considers a) highest level of linguistic diversity, b) highest levels of endangerment, and c) least-studied languages to identify the “Language Hotspots”.

3. Toolkit development and pilot

The toolkit consists of a) Collection of FOSS software (I will try to leverage all the available software or try to create some if something is not available), b) User documentations that can be translated into other languages and used across the world, c) Sample datasets from the test runs to help with using the toolkit, and d) Other Open Educational Resources

4. Train citizen archivists 
in select zones and 5. Pilot toolkit
+
Document
+
Localize toolkit

Some bilingual native speakers — that are conversant in either English or an official language of their region — will be provided training. They — let’s call them “Citizen Archivists” — will use the toolkit and create documentations in their languages, and will help annotate the documentations.

The documentation can include either journalistic reports or different linguistic aspects (folklore, folk songs, narration of traditional games/festivals)

6. Building communities
 of citizen archivists by providing constant training to the citizen archivists. Their inputs will be improving the toolkit constantly, and help grow 7. Audio-visual reporting
 by them

8. Building a repository of stories that matter to the many native language-speaker community and to language research. The annotated audio-visual documentations will not just help grow a historical documentation of many people in their own language, but create resources for linguistic research to revive the language. For instance, a recorded audio library is very essential to build text-to-speech and speech-to-text engines. Such tools not just help people with visual disability and illiteracy but everyone.

There are hundreds of reasons why many languages are dying. This toolkit aims at solving one problem at a time.

Check out some of the frequently asked questions.

Be a friend to your interviewee. They will certainly share their stories with you.
Be polite. Be a friend of your interviewee. (Characters designed by Dooder / Freepik. CCBY)

Getting started with this toolkit

Audio recording:

A home studio setup consisting of a computer installed with a free and open source audio recording/editing software like Audacity, a professional microphone, and a monitoring headphone.
A home studio setup consisting of a computer installed with a free and open source audio recording/editing software like Audacity, a professional microphone, and a monitoring headphone. Read more in our Pronunciation Toolkit.
1. Home studio: You need a microphone to be able to record the audio. If you can, I would suggest to record in a small home studio setup like the picture above (consists of a USB microphone, a computer, and a monitor headphone).

A digital audio recorder is used to record audio during field recording. (Marie-Lan Nguyen, CC-BY 2.5)
2: Field recording with a recorder or phone: The recording setup will largely vary if you are meeting someone outside your home for a field recording. In that case you will need to carry an audio recorder or a smartphone (some sort of recording app installed in it) with earphones. If you’re using a portable recorder make sure you cover the top of the mic with a soft cotton cloth or fake fur to a) avoid dust going inside, and b) the sound of the wind during outdoor recording. Use a rubber band to tighten the base and never touch the cloth/fur while recording. Mics can capture small little movements and completely distort the audio.
3. Recording from phone: Earphones that come with the phones generally work both for phones and computers as compared to the default microphone provided along with . However, avoid sitting in an open space as there is a high probability of a lot of noise being captured unless if you are using a shotgun microphone.

4: Audio editing software: If editing from a computer, Audacity, a free and open source audio editing software is the first choice for many seasoned recording artists. It is robust, easy to use and can be used in multiple platforms. If you are using your phone or tablet to record and edit the audio, then, use your native recording app or try to find a good free alternative in your respective app store. Ideally the recording/editing app should be allowing you to record in a decent losless quality (minimum requirement is 44100 Hz, above 16 bit PCM i.e. 24 or 32 bit, above 220 kbps; check your settings to find these). Save the audio in .WAV or .FLAC (Audacity supports both). If your recorder/phone does not support these formats, try to use an app/online converter like this (MP3→FLAC or M4A→FLAC) to convert the audio into .FLAC.

Video recording:

1. Which camera to use: Frankly speaking, the video is less important here as compared to the audio. With low quality video, viewers would still be able to manage if the audio is loud and clear. So if you are keen on investing, invest on a good quality microphone that can either be connected with the camera or can be used as a secondary recorder. But do not trust your camera’s default microphone. They can literally jeopardize your hard work. As far as the camera goes, you can literally use any camera that allows you to record in a decent quality i.e. above 720p (1280×720 px)—from your phone to a point and shoot camera to a dSLR.

a) Using a camera: Use a shotgun microphone that can be connected directly into your camera so that you don’t need to invest much on audio syncing during post production.
b) Using a phone for recording video: These days most phones come with high quality hardware that are capable of recording good video. But the real key to recording quality video in a phone lies in stabilizing the shot while recording. You can only do that by investing in a small tripod (they are generally really cheap and do the job) that can hold your phone. For this particular project, tripods will be the best.

2. How to edit the videos: You do not need to edit the videos as we will do that for you. You only need to compress the video using a free software like Handbrake, and upload that into YouTube or something similar without making it public. We will download it and ask you to delete so that you don’t have to worry about the amount of space it will take in your hard drive.

Interview process:

It take years to capture the best emotion in an interview but you can master some of the basic gestures that will let you document really valuable information in your audio/video. Before you start the interview, spend some time in engaging the interviewee in a friendly manner. Ask about themselves, what they ate today and so on just like a friend. And explain why you are interviewing them and how it will be valuable to preserve their language. Some people get intimidated by knowing that their voice/video is going to be public. But explain it to them how they are contributing to preserve their language in a form that their future generation can also access. Language are changing rapidly because of many external mediums. The best way to preserve that is to record and make available for others. Share the fact how at least one language is dying in every two weeks.

Below are a set of things you need to ask in whichever language you are interviewing. Question #1 to 10 are mandatory, and the remaining are optional. Read the below to the interviewee right before the interview (you can modify it appropriate and even translate in whichever language you speak to them): Hi, My name is XYZ. I'm calling from THE PLACE YOU'RE
CALLING FROM to document a few details about your language "LANGUAGE
NAME" (optional: for our project PROJECT NAME) so that the
valuable knowledge of your language get recorded in an accessible form.
Based on the form that you filled up, I am recording this call.

This interview will be for about 30 mins. I will upload the recorded interview publicly under a Creative Commons Share-Alike license called CC-BY-SA 4.0. This license allows anyone to use, share, and modify the content even for commercial reproduction. Can I have your permission to proceed?

Ask the following questions if they allow you to proceed:

  1. Can you pronounce your name the way you’d do in your native language/mother tongue?
  2. Where you were born? (skip if they’re not comfortable)
  3. What games did you play as a child? Can you share little about those games?
  4. We all have our grandma stories? (with some curiosity in your face) Can you tell me one that you would have listened as a kid from your grandparents/someone elderly? (nod appropriately and show your emotions while listening to their stories—smile or frown but do not make any noise as we want only their voice to be recorded.)
  5. (with smile in your face) Who doesn’t like songs even though not everyone is a great singer. Your language must have many songs. Would you mind singing one for me? (same gestures as above)
  6. Did you visit a local fair with friends and family as a kid? Can you share your experience?
  7. Imaging I cannot see anything. Can you explain me in words all the activities that you’d do from daybreak until you go to bed?
  8. What’s your favorite traditional food? How is it prepared? (again react to them while listening with curiosity in your face, nod appropriately)
  9. Can you tell some words in your language? Maybe things that you use everyday?
  10. If I learn your language (if you yourself are not a native speaker), how do I greet a guest in the house, talk with them or offer them some food?
    if you speak the same language that they also speak Can you act how you’d welcome a guest to your home and explain me the meaning of each of the greetings? (act as the guest and ask meaning of all the greeting/conversation phrases they say)

Metadata

Collecting metadata is the most essential component of this documentation. When you document something in a less-known language and publish it online, you also need to share some of the most vital information about the documentation. See a sample below that is taken from our Karbi-language documentation page.

Language details Recording details
Language Karbi Recording content Narration of a folklore, a folk song, a local festival, traditional games, Meaning of “Karbi”, speaker’s daily activities
Dialect N/A Recording location Remote, Speaker at Karbi Anglong district, Assam, India
Alternate name(s) Mikir Recording date Sunday, May 21, 2017 at 8:55 AM
SIL Code mjw Recordist(s) Subhashish Panigrahi
Current state Living, endangered Hardware Zoom H1
Language group Tibeto-Burman languages Software Audacity
Possible influences Assamese, Naga #files 10
Open Language Archives Community (OLAC) here File format(s) wav, flac
Swadesh word list here Bit rate 16 bit PCM
Ethnologue here Total audio length (HH:MM:SS) 00:53:37
SIL International here Copyright CC-BY-SA 4.0
Wikipedia here Video N/A
Wikipedia in respective language N/A Image N/A
Scolarly citations on Google Scholar here
Internet Archive resources here
Speaker details
Speaker Gender Male
Speaker Age 50–60
Speaker Origin Karbi Anglong district, Assam, India
Speaker’s Name D.S. Teron
Speaker’s Pictures N/A

Social media

Social media platforms are great ways to promote endangered, indigenous and other marginalized languages, mainly for two reasons: a) As most social media hubs like Facebook, Twitter, Snapchat already have a good user base, one does not need to invest too much to promote their content, b) Most of the popular social media are managed by big corporations or at least by startups that are innovating every now and then to make their user experience better. It is important to make use of their great features optimally but securely. Check out this toolkit — that consists of mostly open practices and methodologies, and some open educational resources on configuration settings — to experiment with social media platforms as a production and promotion tool for your language.

This Open Educational Resource is also published and cataloged on OER Commons , and can also be accessed from the Peer To Peer University (P2PU) .