Multi-modal Multi-label Emotion Recognition with Heterogeneous Hierarchical Message Passing

Authors: Dong Zhang, Xincheng Ju, Wei Zhang, Junhui Li, Shoushan Li, Qiaoming Zhu, Guodong Zhou14338-14346

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Detailed evaluation demonstrates the effectiveness of our approach. Systematic experimentation on two benchmark datasets show that our approach can effectively address the challenges faced by MMER, and advance the state-of-the-art with a large margin.
Researcher Affiliation Collaboration Dong Zhang,1 Xincheng Ju,1 Wei Zhang,2 Junhui Li,1 Shoushan Li,1 Qiaoming Zhu,1 Guodong Zhou1* 1 School of Computer Science and Technology, Soochow University, China 2 Alibaba Group, China
Pseudocode No The paper describes its method using mathematical formulations and architectural diagrams but does not provide a formal pseudocode block or algorithm.
Open Source Code Yes To motivate future research, both code and dataset will be released4. 4https://github.com/MANLP-suda/HHMPN
Open Datasets Yes 1) MOSEI is the only public benchmark for MMER in English. The document-level videos of this dataset are segmented into utterances with three modalities, i.e., the textual, visual and acoustic modalities, while the emotion categories contain happiness, sadness, anger, fear, disgust and surprise. ... For MOSEI, we refer to the original paper3 (Zadeh et al. 2018). ... 2) To further demonstrate the generalization of our approach, we collect a partial time series dataset for MMER from Net Ease Cloud Music1, namely NEMu. ... 1music.163.com
Dataset Splits Yes Dataset Split ... Train Valid Test ... MOSEI 16326 1871 4659 ... NEMu 15125 1891 1891 ... During training, we train each model for a fixed number of epochs 50 and monitor its performance on the validation set. Once the training is finished, we select the model with the best F1 score on the validation set as our final model and evaluate its performance on the test set.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions tools like Librosa and ResNet, and optimizers like Adam, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes For both datasets, we use the same hyper-parameters: the size d of the hidden layer in each modality is 256, iteration times T is set 3, batch size is 64 and λ in joint loss is 0.2. We train HHMPN in an end-to-end manner by minimizing the joint loss function with the Adam optimizer (Kingma and Ba 2015). Besides, we make use of the dropout regularization (Srivastava et al. 2014) to avoid overfitting and clip the gradients (Pascanu, Mikolov, and Bengio 2013) to the maximum norm of 10.0. During training, we train each model for a fixed number of epochs 50 and monitor its performance on the validation set.