Cross-Modal Dynamic Transfer Learning for Multimodal Emotion Recognition
Cross-Modal Dynamic Transfer Learning for Multimodal Emotion Recognition
Blog Article
Multimodal Emotion Recognition is an important research area for developing human-centric applications, especially in the context of video platforms.Most existing models have attempted to develop sophisticated fusion techniques to integrate heterogeneous features from different modalities.However, Plein air - Accessoires - Lampe frontale these fusion methods can affect performance since not all modalities help figure out the semantic alignment for emotion prediction.We observed that the 8.0% of misclassified instances’ performance is improved for the existing fusion model when one of the input modalities is masked.
Based on this observation, we propose a representation learning method called Cross-modal DynAmic Transfer learning (CDaT), which dynamically filters the low-confident modality and complements it with the high-confident modality using uni-modal masking and cross-modal representation transfer learning.We train an auxiliary network that learns model confidence scores to determine which modality is low-confident and how much the transfer should occur from other modalities.Furthermore, it can be used with any fusion model in a model-agnostic way because it leverages transfer between low-level uni-modal information via probabilistic knowledge transfer loss.Experiments have Fuel Can demonstrated the effect of CDaT with four different state-of-the-art fusion models on the CMU-MOSEI and IEMOCAP datasets for emotion recognition.