Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case
Статья подана в специальный выпуск журнала Mathematics. Special Issue Recent Advances in Neural Networks and Application.
Статус: Опубликована.
Kipyatkova I., Kagirov I. Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case // Mathematics. 2023, vol. 11(18), ID 3814.
Авторы: Irina Kipyatkova, Ildar Kagirov
Аннотация: Recently, there has been a growth in the number of studies addressing the automatic processing of low-resource languages. The lack of speech and text data significantly hinders the development of speech technologies for such languages. This paper introduces an automatic speech recognition system for Livvi-Karelian. Acoustic models based on artificial neural networks with time delays and hidden Markov models were trained using a limited speech dataset of 3.5 hours. To augment the data, pitch and speech rate perturbation, SpecAugment, and their combinations were employed. Language models based on 3-grams and neural networks were trained using written texts and transcripts. The achieved word error rate metric of 22.80% is comparable to other low-resource languages. To the best of our knowledge this is the first speech recognition system for Livvi-Karelian. The results obtained can be of a certain significance for development of automatic speech recognition systems not only for Livvi-Karelian, but also for other low-resource languages, including the fields of speech recognition and machine translation systems. Future work includes experiments with Karelian data using techniques like transfer learning and DNN language models.
Ключевые слова: low-resource languages; automatic speech recognition; audio data augmentation; time delay neural network; hidden Markov models; long short-term memory.
Ключевые слова: low-resource languages; automatic speech recognition; audio data augmentation; time delay neural network; hidden Markov models; long short-term memory.
Скрипт обучения и тестирования системы распознавания карельской речи для инструментария Kaldi: run_tdnnf.sh
Скрипт обучения с применением спектральной аугментации и тестирования системы распознавания карельской речи для инструментария Kaldi: run_tdnnf_SpecAugment.sh
Примеры из тестового набора аудиоданных с текстовой расшифровкой и результатами распознавания (“Ref” – текст фразы; “Hyp” – результат распознавания; C – количество правильно распознанных слов; S, I, D – количество замененных, вставленных и удаленных слов соответственно):
Ref: vot konzu kondiedu näit midä pidäy ruadua
Hyp: vot konzu kondiedu näit midä pidäy ruadua
C=7; S=0; I=0; D=0
Ref: da nygöi kaikile kalmužimale levittih landišat
Hyp: da nygöi kaikile kalmužimale levitti ellendykset
C=4; S=2; I=0; D=0
Ref: noril'skas oli ruadoi sie sit že zavodas vot
Hyp: no rikas oli ruadoi sie sit zavodas vot
C=6; S=1; I=1; D=1