Skip to Main content Skip to Navigation
Conference papers

Multi-modal Fusion for Continuous Emotion Recognition by Using Auto-Encoders

Abstract : Human stress detection is of great importance for monitoring mental health. The Multimodal Sentiment Analysis Challenge (MuSe) 2021 focuses on emotion, physiological-emotion, and stress recognition as well as sentiment classification by exploiting several modalities. In this paper, we present our solution for the Muse-Stress sub-challenge. The target of this sub-challenge is continuous prediction of arousal and valence for people under stressful conditions where text transcripts and audio and video recordings are provided. To this end, we utilize bidirectional Long Short-Term Memory (LSTM) and Gated Recurrent Unit networks (GRU) to explore high-level and low-level features from different modalities. We employ Concordance Correlation Coefficient (CCC) as a loss function and evaluation metric for our model. To improve the unimodal predictions, we add difficulty indicators of the data obtained by using Auto-Encoders. Finally, we perform late fusion on our unimodal predictions in addition to the difficulty indicators to obtain our final predictions. With this approach, we achieve CCC of 0.4278 and 0.5951 for arousal and valence respectively, our submission to MuSe 2021 ranks in the top three for arousal and fourth for valence.
Document type :
Conference papers
Complete list of metadata
Contributor : Contributeur MAP CEA Connect in order to contact the contributor
Submitted on : Friday, January 7, 2022 - 4:12:27 PM
Last modification on : Wednesday, January 12, 2022 - 3:25:19 AM
Long-term archiving on: : Friday, April 8, 2022 - 7:45:33 PM


 Restricted access
To satisfy the distribution rights of the publisher, the document is embargoed until : 2023-01-19

Please log in to resquest access to the document


Distributed under a Creative Commons Attribution - NonCommercial - ShareAlike 4.0 International License




Salam Hamieh, Vincent Heiries, Hussein Al Osman, Christelle Godin. Multi-modal Fusion for Continuous Emotion Recognition by Using Auto-Encoders. MM '21: ACM Multimedia Conference, Oct 2021, Virtual Event China, France. pp.21-27, ⟨10.1145/3475957.3484455⟩. ⟨cea-03517175⟩



Record views