Adaptive Graph Learning for Multimodal Conversational Emotion Detection

Authors: Geng Tu, Tian Xie, Bin Liang, Hongpeng Wang, Ruifeng Xu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that Ada IGN outperforms stateof-the-art methods on two popular datasets.
Researcher Affiliation Academia 1Harbin Institute of Technology, Shenzhen, China 2Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies 3The Chinese University of Hong Kong, Hong Kong, China 4Peng Cheng Laboratory, Shenzhen, China tugeng0313@gmail.com, xuruifeng@hit.edu.cn
Pseudocode No The paper describes the model architecture and mathematical formulations but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Our code will be available at https://github.com/Tu Gengs/Ada IGN.
Open Datasets Yes We benchmark Ada IGN on two well-known conversational datasets: IEMOCAP (Busso et al. 2008) is a dataset of interactive emotional binary motion capture recordings with ten actors in dialogues... MELD (Poria et al. 2018) has multi-party conversation videos from the Friends TV series...
Dataset Splits Yes The data split of datasets in Table 1 is as follows (Ghosal et al. 2020a). As the IEMOCAP dataset does not come with a predefined train/validation split, we allocate 10% of the training dialogues for validation. Table 1: Statistics of two datasets. (Includes train, val, test numbers)
Hardware Specification Yes All experiments are conducted at a single Tesla V100s-PCIE-32GB GPU.
Software Dependencies No The paper mentions using models/toolkits like 'Roberta Large model', 'Open Smile', and 'Dense Net model' for feature extraction, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions).
Experiment Setup Yes The learning rates is 3e-4 for IEMOCAP and 1e-3 for MELD. We train our model using a batch size of 32 conversations with Adam optimizers. NESP and GSP are randomly initialized for policy initialization. For policy learning, we employ an Adam optimizer with a learning rate of 2e-2. For other hyperparameters, da is 1582 for IEMOCAP and 300 for MELD. dv=342, dt=1024, dh=200, and dm=100. γ=0.6, ϕ=0.2, ω=0.9, and µ=0.1. λη is 3 (a), 0 (v), and 1 (t) for IEMOCAP; and 0.5 (a), 0.5 (v), and 1.5 (t) for MELD. The number of GCN layers l is 16 for IEMOCAP and 32 for MELD. The selection policy distribution size is set to 200 * batch size, where 200 is the max sequence length.