Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Catch Your Emotion: Sharpening Emotion Perception in Multimodal Large Language Models

Authors: Yiyang Fang, Jian Liang, Wenke Huang, He Li, Kehua Su, Mang Ye

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that SEPM significantly improves MLLM performance on emotion-related tasks, providing a resource-efficient and scalable solution for emotion recognition.
Researcher Affiliation	Academia	1National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China 2Taikang Center for Life and Medical Sciences, Wuhan University, Wuhan, China. Correspondence to: Kehua Su <EMAIL>, Mang Ye <EMAIL>.
Pseudocode	Yes	Algorithm 1 SEPM Input: Multimodal Large Language Models M, Coarse-Grained Query Qc, Sample D. Output: Specific emotion category E.
Open Source Code	Yes	Our code is available in https://github.com/fuyyyyy/SEPM.
Open Datasets	Yes	We evaluate our framework on four emotion datasets, which are annotated across different scenarios and numbers of categories: Emotion6 (Peng et al., 2015), Emo Set (Yang et al., 2023), Web Emo (Panda et al., 2018), and Abstract (Machajdik & Hanbury, 2010).
Dataset Splits	No	The paper lists several datasets used for evaluation (Emotion6, Emo Set, Web Emo, Abstract) but does not specify the train/validation/test splits used for their experiments. It mentions 'Zero-shot inference' but no explicit data partitioning details for reproducibility.
Hardware Specification	Yes	All experiments are conducted on 8 NVIDIA 4090 GPUs, each with 24GB of memory.
Software Dependencies	No	The paper mentions using LLaVA (Liu et al., 2023) and VILA (Lin et al., 2024) as foundation models, but it does not specify version numbers for these or any other software libraries or programming languages used for implementation.
Experiment Setup	Yes	The confidence threshold α and drop rate β are set to 0.1 and 0.2 by default, respectively.