Automatic multilingual speech recognition with support for code-switching (on the example of the Karelian and Russian languages)
Automatic multilingual speech recognition with support for code-switching (on the example of the Karelian and Russian languages)
The objective of this project is to develop a prototype multilingual speech recognition system with support for code-switching, on the example of Karelian and Russian.
There are many communities in the world that use two or more languages (multilingualism) in everyday communication. One of the most striking examples of multilingualism is India (over 400 living languages, and the vast majority of Indian citizens speak at least two languages). There are over 150 languages in Russia, which led to the emergence of developed multilingualism in a number of regions (Republic of Tatarstan, Republic of Tuva, Republic of Dagestan, etc.).
One phenomenon characteristic of multilingual communities is code switching. Code switching is the spontaneous transition of a speaker from one language or dialect to another. Code switching can occur both between utterances and within a sentence.
Developing an automatic speech recognition system with support for code-switching is a considerably more challenging task than development of a plain multilingual system. The main difficulty is the lack of training data. This primarily applies to text data, since written texts are usually stylistically edited, being free of any code-switching phenomena. Solution to this problem implies a lot of work on the collection and annotation of specific language data, as well as the development of methods for training data augmentation. Acoustic and language speech modeling regarding code switching is a non-trivial task, and in general, automatic speech recognition systems of this type perform worse than speech recognition systems with no support for code-switching. The development of a speech recognition system for the Karelian and Russian languages is further complicated by the fact that Karelian belongs to low-resource languages, i.e., languages with little information support (the absence or small amount of Internet resources, digitized databases, language processing software).
The development of the claimed system is of relevance for two reasons: first, the approaches and solutions found will be of importance for the development of speech recognition systems with code switching for other world languages as well; second, the development of such a system contributes to the study of the Karelian language, which is especially important due to the fact that the Karelian language belongs to the endangered languages.
The practical value of this study is that the creation of the claimed system contributes to the research of the low-resourced Karelian language, and the results of the project can be used in the work of field linguists dealing with language contacts and the modern Karelian language.
Results for 2024
At the first stage of the project in 2024, the following tasks were completed: conducting an analytical review of the research topic; recording, transcribing, and segmenting speech data in Livvi-Karelian with Karelian-Russian code switching; developing a phoneme alphabet merging the phonemes of the Karelian and Russian languages, and developing a phonemic vocabulary for the Karelian-Russian speech recognition system.
The analytical review encompasses more than 50 sources. It examines the main methods and approaches to building a speech recognition system that supports code-switching. It also addresses key techniques used for training low-resource systems. The review concludes that one of the most effective training methods for such systems is leveraging pre-trained multilingual models followed by fine-tuning on data the target languages. Additionally, various methods of speech and text data augmentation can be employed, including speech synthesis, partial automatic text translation, and text modification.
Samples of spontaneous speech in Livvi Karelian was recorded. A total of 37 native Karelian speakers (16 men and 21 women) took part in the recording sessions. After removing noisy fragments, the total duration of speech data is 3 hours. The Russian code accounts for 27% of the recordings. The recordings are stored in WAV files with a sampling rate of 16 kHz, 16 bits per sample, mono.
The audio recordings were transcribed and segmented into individual clauses. A speech corpus "Speech Database with Karelian-Russian Code-Switching (KarRusCoS)" containing annotations of the recorded speech data was created. KarRusCoS includes: audio recordings of Karelian speech as well as annotations representing the speaker's identification number; gender; transcriptions of his/her utterances; the duration of each clause; the number of words in Karelian; the number of words in Russian; the number of words with intra-word code-switching; and the total word count per phrase. A certificate of database registration has been obtained from the Federal Institute of Industrial Property (FIPS).
A phoneme alphabet was created by merging the phoneme sets of Karelian and Russian languages, resulting in a total set of 68 phonemes.
A phonemic vocabulary was made, combining word forms from both Karelian and Russian. Additionally, in order to account for intra-word code-switching, the vocabulary included Russian word stems with Karelian affixes. Phonemic transcriptions for all words in the vocabulary were generated automatically.
The findings of the research in 2024 were presented at the International Conference on Speech and Computer (SPECOM 2024) (Belgrade, Serbia), the 5th International Scientific Conference on Engineering and Applied Linguistics "Piotrovsky Readings 2024" (St. Petersburg, Russia), and the 20th Scientific Conference "Bubrich Readings: Traditions and Innovations in the Study of Finno-Ugric Languages and Cultures" (Petrozavodsk, Russia). The results obtained were published in the Lecture Notes in Computer Science series.
Addresses of Internet resources prepared for the Project:
- Kipyatkova I., Kagirov I., Dolgushin M., Rodionova A. Towards a Livvi-Karelian End-to-End ASR System // In Proc. of 26th International Conference on Speech and Computer SPECOM 2024, Springer LNCS, vol. 15299, Belgrade, Serbia, 2024, pp. 57-68.
- KarRusCoS – Speech Database with Karelian-Russian Code-Switching