Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Multi-Modal Attentive Prompt Learning for Few-shot Emotion Recognition in Conversations
Authors: Xingwei Liang, Geng Tu, Jiachen Du, Ruifeng Xu
JAIR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate our proposed model s efficacy, we conducted extensive experiments on two widely recognized benchmark datasets, MELD and IEMOCAP. Our results demonstrate that the MAP framework outperforms state-of-the-art ERC models, yielding notable improvements of 3.5% and 0.4% in micro F1 scores. |
| Researcher Affiliation | Academia | Xingwei Liang EMAIL Geng Tu EMAIL Jiachen Du EMAIL Harbin Institute of Technology, Shenzhen, P.R.China, 518055 Ruifeng Xu EMAIL Harbin Institute of Technology, Shenzhen, P.R.China, 518055 Peng Cheng Laboratory, Shenzhen, China Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies |
| Pseudocode | No | The paper describes the model architecture and procedures in detail using prose and mathematical equations. It does not contain any explicitly labeled "Pseudocode" or "Algorithm" blocks, nor does it present structured steps formatted like code. |
| Open Source Code | No | The paper does not provide any concrete access information for source code, such as a repository link, an explicit code release statement, or mention of code in supplementary materials. |
| Open Datasets | Yes | To evaluate our proposed model s efficacy, we conducted extensive experiments on two widely recognized benchmark datasets, MELD and IEMOCAP. Our results demonstrate that the MAP framework outperforms state-of-the-art ERC models, yielding notable improvements of 3.5% and 0.4% in micro F1 scores. ... Datasets. MELD2 (Poria et al., 2019) and IEMOCAP3 datasets (Busso et al., 2008) are selected as our datasets. MELD contains 13,708 utterances from 1433 dialogues of Friends TV series. It annotates each utterance with one of seven emotions (anger, disgust, fear, joy, neutral, sadness or surprise). It contains a total of approximately 33 hours of dialogues. IEMOCAP is a multi-modal database of ten speakers involved in two-way dyadic conversations. ... 2. https://affective-meld.github.io/ 3. http://sail.usc.edu/iemocap/ |
| Dataset Splits | Yes | MELD and IEMOCAP are multi-modal ERC datasets that involve all the textual, visual, and acoustic information. The dataset details are provided in Table 1. ... Table 1: Training, validation, and test data distribution in the datasets. |
| Hardware Specification | No | The paper mentions training time comparisons in Section 4.7 and refers to models having a certain number of parameters, but it does not specify the exact hardware (e.g., GPU models, CPU types, memory amounts) used for running the experiments. |
| Software Dependencies | No | The paper mentions various software components and models used, such as BERT, ResNet, VGGish, Bi GRU, Transformer, RoBERTa, EmoBERTa, and the sklearn package. However, it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | Hyperparameter Setting. The textual, visual, and acoustic inputs are initialized with BERT, Res Net, and VGGish, respectively. All weight matrices are given their initial values by sampling from a uniform distribution U( 0.1, 0.1). The optimal learning rate is set to 4e-7 for MELD dataset and 6e-7 for the IEMOCAP datasets. The batch size is set to 1 and the number of epochs is set to 50 for MELD and 150 for IEMOCAP. The dropout rate is set to 0.1 for MELD and 0.5 for IEMOCAP. |