Speech and Multimodal Interfaces Laboratory

Analysis of Voice and Facial Features of a Human in a Mask

Analysis of Voice and Facial Features of a Human in a Mask

Due to an unexpected occurrence and the current rapid global spread of the coronavirus COVID-19 pandemic, the most urgent task is to monitor the level of security of individuals and the whole society in the new world of a social distance and a "mask" culture. In recent years, the periodic wearing of protective face masks in public places has become absolutely familiar and commonplace for many residents of densely populated Asian countries (Japan, Singapore, Malaysia, China, etc.). Therefore, they were protected from people with possible respiratory diseases, air pollution and allergens. This mask culture and strict observance of quarantine requirements by the population of these Asian countries became the main guarantee of the extinction of COVID-19 spread. In recent months, masks have become an element of European culture and even fashion, firmly entering our dress code. Now and in the coming years there is an urgent need for automated verification of the presence of a protective mask for people who are in public places or are in contact with infected people or those at risk of infection. As part of this RFBR project, it is proposed to develop and research a new software system for automatic bimodal analysis of voice and facial characteristics of a masked person.

A number of fundamentally new scientific and technical results will be obtained during 2-year research project: (1) new infoware - a bimodal Russian-language database (corpus) containing multi-angle images of people's faces in various variations of protective masks, as well as audio recordings of dozens native speakers of the Russian language in masks, including disposable medical masks of various densities, reusable fabric masks of various colors with and without drawings, special respirators and other means of protecting the mucous surfaces of the face; (2) new methods and models for the automatic analysis of people's voice characteristics by speech, including the presence of a protective mask when speaking, detection of cough, the likelihood of a respiratory disease, etc.; (3) new methods and models for analyzing the facial characteristics of people by video data, including detection of the presence or absence of a protective mask on the face, biometric characteristics of the open part of the face (upper part of the head) of a person; (4) a prototype software system for automatic bimodal analysis of voice and facial characteristics of a person in a mask.

The results of these studies based on modern artificial intelligence technologies can be directly used to combat the spread of viral epidemics (coronaviruses, including COVID-19, flu viruses, and other very pathogenic types of viruses in the future) both in Russia and around in the world.

Results for 2020

At the first stage of the project RFBR № 20-04-60529, an extended analytical review in the topic of protective mask detection on the human face by voice and facial characteristics, respiratory diseases, as well as automatic COVID-19 recognition by human speech and sounds currently available audio-visual speech corpora was carried out. Along with this software has been developed for recording video data in order to collect and annotate a bimodal data corpus with different angles of people in various variations of protective masks and audio recordings of continuous Russian speech of people in masks. A distinctive feature of the software is the ability to capture and record video data simultaneously from several mobile devices in parallel (up to 3 devices). A new methodology for creating corpora of audio-visual speech was proposed. This methodology allows recording multi-angle data corpora and spontaneous speech. In order to solve the fundamental problem of detecting protective masks on a person's face by voice and facial characteristics, a bimodal Russian-language database (corpus) BRAVE-MASKS was recorded, which includes records of 30 native Russian speakers. The corpus contains 44820 video files (with audio) in MOV format, 180 files with spelling text of spoken phrases in TXT format, as well as about 2 million frame-by-frame images extracted from video recordings in JPG format. The corpus was recorded using two smartphones and one tablet, controlled by the developed software for the iOS operating system. In addition, preliminary research results were obtained on the automatic recognition of the presence / absence of a protective mask on a person's face, both by video and audio information. Moreover, an approach for creating a synthetic dataset using the methodology of overlaying masks on images of human faces is presented.

During the reporting period, a series of articles based on the results of the project were prepared and published in editions and journals indexed in Scopus and Web of Science systems, including proceedings of international conferences INTERSPEECH (A level in CORE ranking), PSBB (ISPRS International Workshop “Photogrammetric and computer vision techniques for video Surveillance, Biometrics and Biomedicine”), Zavalishin’s Readings, as well as a review article for the Russian journal "Informatics and Automation" (indexed in Scopus). Developed computer software and database (BRAVE-MASKS - Biometric Russian Audio-Visual Extended MASKS corpus) have been officially registered in Rospatent.

Addresses of Internet resources prepared for the Project:

  1. A bimodal Russian-language database (corpus) of persons in protective masks (BRAVE-MASKS - Biometric Russian Audio-Visual Extended MASKS corpus)
  2. Software for recording audiovisual data of persons in protective masks
  3. Ryumina E., Ryumin D., Ivanko D., Karpov A. A Novel Method for Protective Face Mask Detection Using Convolutional Neural Networks and Image Histograms // International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences – 2021. – 2021. – Vol. XLIV-2/W1-2021. – pp. 177–182. DOI: 10.5194/isprs-archives-XLIV-2-W1-2021-177-2021
  4. Markitantov M., Dresvyanskiy D., Mamontov D., Kaya H., Minker W., Karpov A. Ensembling End-to-End Deep Models for Computational Paralinguistics Tasks: ComParE 2020 Mask and Breathing Sub-Challenges // INTERSPEECH 2020. – 2020. – pp. 2072-2076. DOI: 10.21437/Interspeech.2020-2666
  5. Ryumina, E., Verkholyak, O., Karpov, A. Annotation Confidence vs. Training Sample Size: Trade-Off Solution for Partially-Continuous Categorical Emotion Recognition // INTERSPEECH 2021. – 2021. – pp. 3690-3694. DOI: 10.21437/Interspeech.2021-1636
  6. Letenkov M., Iakovlev R., Karpov A. Approach to Image-Based Recognition of User Face in Setting of Partial Face Occlusion by Personal Protective Equipment // Electromechanics and Robotics. Smart Innovation, Systems and Technologies. – Vol. 232. – 2021. – pp. 249-258. DOI: 10.1007/978-981-16-2814-6_22
Project's head
Number
N 20-04-60529-viruses
Period
2020-2022
Financing
Russian Foundation for Basic Research (RFBR)