Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Catch Your Emotion: Sharpening Emotion Perception in Multimodal Large Language Models
Authors: Yiyang Fang, Jian Liang, Wenke Huang, He Li, Kehua Su, Mang Ye
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that SEPM significantly improves MLLM performance on emotion-related tasks, providing a resource-efficient and scalable solution for emotion recognition. |
| Researcher Affiliation | Academia | 1National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China 2Taikang Center for Life and Medical Sciences, Wuhan University, Wuhan, China. Correspondence to: Kehua Su <EMAIL>, Mang Ye <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 SEPM Input: Multimodal Large Language Models M, Coarse-Grained Query Qc, Sample D. Output: Specific emotion category E. |
| Open Source Code | Yes | Our code is available in https://github.com/fuyyyyy/SEPM. |
| Open Datasets | Yes | We evaluate our framework on four emotion datasets, which are annotated across different scenarios and numbers of categories: Emotion6 (Peng et al., 2015), Emo Set (Yang et al., 2023), Web Emo (Panda et al., 2018), and Abstract (Machajdik & Hanbury, 2010). |
| Dataset Splits | No | The paper lists several datasets used for evaluation (Emotion6, Emo Set, Web Emo, Abstract) but does not specify the train/validation/test splits used for their experiments. It mentions 'Zero-shot inference' but no explicit data partitioning details for reproducibility. |
| Hardware Specification | Yes | All experiments are conducted on 8 NVIDIA 4090 GPUs, each with 24GB of memory. |
| Software Dependencies | No | The paper mentions using LLaVA (Liu et al., 2023) and VILA (Lin et al., 2024) as foundation models, but it does not specify version numbers for these or any other software libraries or programming languages used for implementation. |
| Experiment Setup | Yes | The confidence threshold α and drop rate β are set to 0.1 and 0.2 by default, respectively. |