Skip navigation

Exploiting multi-CNN features in CNN-RNN based dimensional emotion recognition on the OMG in-the-wild dataset

Exploiting multi-CNN features in CNN-RNN based dimensional emotion recognition on the OMG in-the-wild dataset

Kollias, Dimitrios ORCID logoORCID: https://orcid.org/0000-0002-8188-3751 and Zafeiriou, Stefanos P. (2020) Exploiting multi-CNN features in CNN-RNN based dimensional emotion recognition on the OMG in-the-wild dataset. IEEE Transactions on Affective Computing, 12 (3). pp. 595-606. ISSN 1949-3045 (Online) (doi:10.1109/TAFFC.2020.3014171)

[thumbnail of Author Accepted Manuscript]
Preview
PDF (Author Accepted Manuscript)
29426 KOLLIAS_Exploiting_Multi-CNN_Features_(AAM)_2020.pdf - Accepted Version

Download (2MB) | Preview

Abstract

This paper presents a novel CNN-RNN based approach, which exploits multiple CNN features for dimensional emotion recognition in-the-wild, utilizing the One-Minute Gradual-Emotion (OMG-Emotion) dataset. Our approach includes first pre-training with the relevant and large in size, Aff-Wild and Aff-Wild2 emotion databases. Low-, mid- and high-level features are extracted from the trained CNN component and are exploited by RNN subnets in a multi-task framework. Their outputs constitute an intermediate level prediction; final estimates are obtained as the mean or median values of these predictions. Fusion of the networks is also examined for boosting the obtained performance, at Decision-, or at Model-level; in the latter case a RNN was used for the fusion. Our approach, although using only the visual modality, outperformed state-of-the-art methods that utilized audio and visual modalities. Some of our developments have been submitted to the OMG-Emotion Challenge, ranking second among the technologies which used only visual information for valence estimation; ranking third overall. Through extensive experimentation, we further show that arousal estimation is greatly improved when low-level features are combined with high-level ones.

Item Type: Article
Uncontrolled Keywords: Deep convolutional and recurrent neural architectures, CNN plus Multi RNN, low-, mid-, high-level features, multi-CNN feature extraction and aggregation, multi-task learning, facial image analysis, valence, arousal, emotion recognition in-the-wild, AffWildNet, AffWild and AffWild2 emotion databases, OMG-Emotion database and Challenge
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Faculty / School / Research Centre / Research Group: Faculty of Engineering & Science
Faculty of Engineering & Science > School of Computing & Mathematical Sciences (CMS)
Last Modified: 23 May 2022 10:30
URI: http://gala.gre.ac.uk/id/eprint/29426

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics