Paper in the international journal Expert Systems with Applications (Q1)
Our laboratory published an article in the international journal Expert Systems with Applications (Scopus, Q1):
Markitantov M., Ryumina E., Karpov A. Audio-visual occlusion-robust gender recognition and age estimation approach based on multi-task cross-modal attention // Expert Systems with Applications, 2026, vol. 296, 127473. (WOS IF=7.5 Q1, Scopus SJR=1.85 Q1 AI)
Gender recognition and age estimation are essential tasks in soft-biometric systems. Real-world conditions such as partial facial occlusion hinder these tasks by obscuring crucial voice and facial cues, motivating robust and efficient solutions. We present ORAGEN — an audio-visual Occlusion-Robust GENder recognition and AGE estimation approach — built on intermediate features of unimodal transformer models and two Multi-Task Cross-Modal Attention (MTCMA) blocks to jointly predict gender, age, and protective-mask type from voice and facial characteristics. We conduct extensive cross-corpus experiments on TIMIT, aGender, CommonVoice, LAGENDA, IMDB-Clean, AFEW, VoxCeleb2, and BRAVE-MASKS. The proposed unimodal models outperform state-of-the-art baselines for gender and age; we also analyze the effect of mask types. On VoxCeleb2 (Test), ORAGEN achieves UAR=99.51% (gender), MAE=5.42 (age), and UAR=100% (mask type); on BRAVE-MASKS (Test), UAR=96.63%, MAE=7.52, and UAR=95.87%. Results indicate that including masked-face data and the mask-type task improves all targets. ORAGEN can be integrated into expert systems (e.g., OCEAN-AI) for applications in forensics, healthcare, and industrial safety.