We took 1st place in the ABAW-2025 Compound Expression Challenge at ICCV-2025
We participated in the ABAW 2025 – Compound Expression Recognition Challenge and achieved 1st place. The challenge is part of the 9th Workshop & Competition on Affective & Behavior Analysis in-the-wild (ABAW) at ICCV 2025.
ICCV 2025 dates: October 19–23, 2025, Honolulu, Hawaii, USA. Workshop days are October 19–20. (Source: The Computer Vision Foundation / ICCV.)
About the Challenge: Compound Expression Recognition (ABAW-2025)
This track targets compound emotion recognition in-the-wild videos. Evaluation uses a subset of the C-EXPR-DB audiovisual dataset (56 videos), with the frame-level macro-F1 as the primary metric (averaged across classes).
Recognized compound expressions
- Fearfully Surprised
- Happily Surprised
- Sadly Surprised
- Disgustedly Surprised
- Angrily Surprised
- Sadly Fearful
- Sadly Angry
Paper abstract
Compound Expression Recognition (CER), a subfield of affective computing, aims to detect complex emotional states formed by combinations of basic emotions. In this work, we present a novel zero-shot multimodal approach for CER that combines six heterogeneous modalities into a single pipeline: static and dynamic facial expressions, scene and label matching, scene context, audio, and text. Unlike previous approaches relying on task-specific training data, our approach uses zero-shot components, including Contrastive Language-Image Pretraining (CLIP)-based label matching and Qwen-VL for semantic scene understanding. We further introduce a Multi-Head Probability Fusion (MHPF) module that dynamically weights modality-specific predictions, followed by basic-to-compound emotion conversion that uses Pair-wise Probability Aggregation (PPA) or Pair-wise Feature Similarity Aggregation (PFSA) methods to produce interpretable compound emotion outputs. Evaluated under multi-corpus training, the proposed approach achieves macro-F1 scores of 46.95% on AffWild2, 49.02% on Acted Facial Expressions in The Wild (AFEW), and 34.85% on C-EXPR-DB via zero-shot testing, comparable to supervised approaches trained on target data. Thus, our approach effectively captures Compound Expressions (CE) without domain adaptation. The source code is publicly available at https://github.com/SMIL-SPCRAS/ICCVW_25.