Lekatha is a text-to-speech (TTS) project which is in its infancy at the moment. The word lekatha does not mean anything but it is constructed from two Odia-language words ଲେଖା (lekha, meaning text) and କଥା (katha, meaning voice) which refers to constructed voice from text by using a TTS engine.
- What does it do?
If Lekatha is a locomotive, its engine is a Python-based TTS tool that was originally written by written by Alex I. Ramirez. Thanks to Alex who released this as an open source software. The code and the workflow for Odia was then created by Subhashish Panigrahi.
Lekatha’s workflow can be understood in four basic steps:
- Each phoneme of an Odia word is converted into Latin-character equivalent using a converter. For this, an Odia–Latin transcription chart (just like Arpabet) was created that clearly defines a Latin equivalent of Odia phonemes.
- The output from the converter is copied into a text file
- Each phoneme is recorded and saved as a
.wavfile in a folder. For instance, the Odia phoneme “କା”‘s Latin equivalent is “KA” and the audio file is named as
- When the tool is run, it asks for a word or phrase. Here one has to input the word in Latin alphabet. The tool then matches with the recorded phonemes, joins multiple phonemes to create a word
Odia transcription chart
|Odia phoneme||Latin equivalent||Odia phoneme||Latin equivalent|
A small set of phonemes were recorded to test how it works. Not all the examples below are real words but they were constructed to see how the tool works with different phoneme combination.How to test the tool for your language?
|Odia word||Latin equivalent||Odia Transcription Chart||Synthesized audio|
- A computer running Linux/MacOS (preferably upgraded to latest available stable OS version)
- Python 3 or above (Download the latest stable version from here. Please note you might already have Python 2.7 or any other version lower than Python 3. Do NOT delete them.)
- PyAudio (downloading and installation here)
- Audacity (download the latest version from here)
- Wordlist containing words of your language in the following format in a text file:
<Word in Latin alphabet><space><space><Phonetic transcription> It would like below:
WORD WO AR D
For instance, the Odia language word
କସି should be added in the text file as:
KOSI KO SI
- The phonetic transcription for each language is different. If your language is written in Latin alphabet then you can use this tool for creating a wordlist that can straightaway be used for our software. If not, then you can first create a phonetic transcription like the one showed here for
- Install Python3, Pyaudio and Audacity
- Download the tool, and unzip it
- Go to the “sounds” folder and delete everything.
- Record all the phonemes of your language using Audacity, and save them exactly the way you have transcribed your phonemes. For instance, if a phoneme is defined as “KO”, you need to save it as
KO.wav. Ideally all the phonemes of your language should be there in order to make it work for your language but you can record only a few to test it.
- Edit the “wordlist.txt” inside your folder, and replace everything with a list of words (see the previous section to learn how to create one for your language), and save it
- Run your Linux/Mac Terminal and use the
cd FILELOCATIONto locate the folder
- For instance, if your “Lekatha” folder is located in Desktop you need to type:
- Now type:
- You will see a message “
Enter a word or phrase:“
- Type the word in Latin alphabet e.g. “KOSI” and enter
- It should ideally pronounce the word
- All the software components are licensed under a GNU General Public License v. 3.0 and the text/audio-visual and documentations are licensed under Creative Commons Attribution-ShareAlike 4.0 license
- While forking the software please attribute to the following:
Original software: Alex I. Ramirez, Apache License 2.0. <https://github.com/alexram1313/text-to-speech-sample>. Derivative: Subhashish Panigrahi, GNU General Public License v3.0. <https://github.com/OdiaWikimedia/Lekatha>
- While making derivatives of anything other than the software, please attribute to the following:
Subhashish Panigrahi, 2017, CC-BY-SA 4.0