Speech and Multimodal Interfaces Laboratory

Software and infoware for intelligent analysis of video and audio information for assistive mobile systems in vehicles

Software and infoware for intelligent analysis of video and audio information for assistive mobile systems in vehicles

Last years, the field of automation and intellectualization of vehicles becomes more and more popular. The main factor determining the interest of researchers in these areas is the high accident rate on public roads, both in Russia and abroad. At that, more than half of road accidents occur as a result of the human factor. A new approach, based on a contactless voice interface for interacting with assistive systems in vehicle, is proposed to solve the problem of diverting the driver’s hands and visual attention from driving a vehicle. Existing voice control systems differ in the set of supported languages, the number of recognizable commands, the number of implemented functions, etc. But such voice control systems have one thing in common - they do not work well in conditions of strong acoustic noise, which is common for vehicles in traffic, especially at high speed. In acoustically noisy environments, visual information about speech plays an important role. To improve the quality and robustness of automatic speech recognition in noisy traffic environment we propose the development and research of an audio-visual speech recognition system based on the joint processing of video and audio information, integrating modern methods of computer vision for automatic lip-reading and methods for analyzing acoustic information.

A number of fundamentally new scientific and technical results will be obtained during 3-year research project: (1) new infoware - multi-speaker audio-visual corpus (bimodal speech database) of Russian speech with multi-angle video data and microphone audio data containing recordings of dozens native speakers of the Russian language; (2) software for recording multi-angle video and audio information and software for audio-visual Russian speech recognition, which allows automatic recognition of Russian speech in interactive applications with small and medium vocabulary; (3) software for recording data from accelerometer and speed sensors synchronized with video and audio recordings to track the psycho-emotional state of the driver. Such systems, combining the processing of video data and microphone audio data, to robust recognition of Russian speech in acoustically noisy traffic conditions, have not been conducted either in Russia or abroad.

Project's head
Number
N 19-29-09081-mk
Period
2019-2022
Financing
Russian Foundation for Basic Research (RFBR)