Speech and Multimodal Interfaces Laboratory

Results of the INTERSPEECH 2022 conference

The 23rd International Conference INTERSPEECH 2022 this year was held from September 18 to 20 in Incheon, South Korea in a hybrid format. This conference is the largest in the world dedicated to the science and technology of live speech processing, and this year the title topic of which was «Human and Humanizing Speech Technology».

Virtual participants performed in poster format: they were asked to prepare a 15-minute video presentation of the presentation, as well as an A0 size poster. In addition, on the day of the session, virtual participants had to be present in a special Gather platform to answer questions about the presentations. This platform is designed to turn virtual interaction into face-to-face interaction. The organizers built a system of rooms for each session, in which a certain place was assigned to each presentation, indicated by a poster.

The INTERSPEECH conference was devoted to such issues as:

Speech Perception, Production and Acquisition;
Phonetics, Phonology and Prosody;
Analysis of Paralinguistics in Speech and Language;
Speaker and Language Identification;
Analysis of Speech and Audio Signals;
Speech Coding and Enhancement;
Speech Synthesis and Spoken Language Generation;
Speech Recognition – Signal Processing, Acoustic Modeling Robustness, Adaptation, Architecture Search, and Linguistic Components;
Spoken Language Processing: Translation, Information Retrieval, Summarization, Resources and Evaluation.

As a result of the conference, three of our articles were published, one of them jointly with foreign colleagues.

Markitantov M., Ryumina E., Ryumin D., Karpov A. Biometric Russian Audio-Visual Extended MASKS (BRAVE-MASKS) Corpus: Multimodal Mask Type Recognition Task // In Proc. of INTERSPEECH. 2022. pp. 1756-1760. DOI: 10.21437/Interspeech.2022-10240.
Velichko A., Markitantov M., Kaya H., Karpov A. Complex Paralinguistic Analysis of Speech: Predicting Gender, Emotions and Deception in a Hierarchical Frameworkk // In Proc. of INTERSPEECH. 2022. pp. 4735-4739, DOI: 10.21437/Interspeech.2022-11294.
Ivanko D., Ryumin D., Kashevnik A., Axyonov A., Kitenko A., Lashkov I., Karpov A. DAVIS: Driver’s Audio-Visual Speech recognition // In Proc. of INTERSPEECH. 2022. pp. 1141-1142.