Speech and Multimodal Interfaces Laboratory

Intelligent system for multimodal recognition of human's affective states

Intelligent system for multimodal recognition of human's affective states

This interdisciplinary project of the Russian Science Foundation is aimed at solving the problems of multimodal analysis and recognition of affective states of people by their behavior using modern methods of digital signal processing and deep machine learning. The problem of affective computing is very relevant and significant from a scientific, technical and practical point of view. There are many unsolved problems in this area, while the practical application of systems for recognizing human affective states solely from unimodal data (for example, only from audio or video data) has a number of significant limitations. The most natural way for a person to interact and exchange information is multimodal communication, which involves several modalities (communication channels) at the same time, including natural speech and sounds, facial expressions and articulation, hand and body gestures, gaze direction, general behavior, textual information, etc. Multimodal systems for the analysis of human affective states have significant advantages over unimodal methods, allowing analysis in difficult conditions such as noise in one of the information channels (acoustic noise or lack of lighting), as well as in the complete absence of information in one of the channels (the person is silent or not facing the camera). In addition, multimodal analysis often makes it possible to recognize such ambiguous affective phenomena as sarcasm and irony, which are characterized by a clear mismatch between the meaning of the utterance (text analysis) and voice intonation (audio analysis) and facial expressions (video analysis). Therefore, the simultaneous analysis of several components of human behavior (speech, facial expressions, gestures, direction of gaze, text transcriptions of statements) will improve the quality of work and the accuracy of recognition of automatic systems for analyzing affective states in tasks such as recognizing emotions, sentiment, aggression, depression, etc. All these tasks are of great practical importance in the field of technologies of emotional artificial intelligence (Emotional AI), as well as in psychology, medicine, banking, forensic science, cognitive sciences, etc. They are of high scientific and technical, as well as social and economic importance.

The main goal of this RSF project is to develop and research a new intelligent computer system for multimodal analysis of human behavior in order to recognize manifested affective states based on audio, video and text data from a person. A unique feature of the system will be the multimodal analysis, i.e. simultaneous automatic analysis of the user's speech and image, as well as the meaning of his statements for the purpose of determining various psychoemotional (affective), including emotions, sentiment, aggression and depression. At the same time, the target audience of the automated system being developed will include not only the Russian-speaking population, but also other representative groups regardless of gender, age, race and language. Thus, this study is relevant and large-scale both within the framework of Russian and world science.

The main objectives of this project are the development, theoretical and experimental research of infoware, mathematical and software support for the intelligent system of multimodal analysis of affective behavior of people.

To achieve the main goal of the project, the specified tasks must be solved, summarized in 3 sequential stages of work:

  1. development of infoware and mathematical support for the intelligent system of multimodal analysis of affective states (2022);
  2. development and research of mathematical and software support for the intelligent system of multimodal analysis of affective states (2023);
  3. experimental study and evaluation of the intelligent system for multimodal analysis of affective states, development of a system demonstrator and generalization of results (2024).
Project's head
N 22-11-00321
Russian Science Foundation