Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Hyper-Modality Enhancement for Multimodal Sentiment Analysis with Missing Modalities

Authors: Yan Zhuang, Minhao Liu, Wei Bai, Yanru Zhang, Wei Li, Jiawen Deng, Fuji Ren

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on three public benchmarks show that HME consistently outperforms state-of-the-art methods under various missing modality conditions, demonstrating its practicality in real-world MSA applications.
Researcher Affiliation	Academia	1University of Electronic Science and Technology of China 2Shenzhen Institute for Advanced Study, UESTC, 3Tsinghua University EMAIL, EMAIL EMAIL
Pseudocode	No	The paper describes the methodology in prose and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide open access to the data and code.
Open Datasets	Yes	Datasets. Three widely-used datasets are adapted, named CMU-MOSI [38], CMU-MOSEI [39] and IEMOCAP [40].
Dataset Splits	Yes	The UR-FUNNY dataset contains 1,866 videos from 1,741 speakers, comprising 9,588 utterances. The data are divided into 7,614 training, 980 validation, and 994 test instances. The MUSt ARD dataset includes 690 videos, split into 539 training, 68 validation, and 68 test utterances.
Hardware Specification	Yes	All models are evaluated on an NVIDIA GTX 3090 GPU.
Software Dependencies	No	The paper mentions several tools used for feature extraction (e.g., BERT-base, MTCNN, Open Face, COVAREP) and discusses re-implementing baselines from their open-source codes, but it does not specify the software dependencies (e.g., Python, PyTorch/TensorFlow versions, or other libraries with version numbers) for their own HME implementation.
Experiment Setup	Yes	To ensure reliable training and prevent overfitting or underfitting, we train 100 epochs and apply early stopping with a patience of 10 epochs across all reproduced baseline models and HME. Specifically, training is terminated when the validation loss fails to improve for 10 consecutive epochs. [...] For the MOSI and MOSEI datasets, the default hyper-parameters under both settings are as follows: learning rate of 4e-5, batch size of 256, and VIB loss weight α = 1.0. The hidden dimension d for MOSI and MOSEI also differ across protocols. For MOSI, the fixed missing protocol uses a hidden dimension d of 96, while the random missing protocol uses 192. In the case of MOSEI, the hidden dimension d is set to 192 for both protocols. In contrast, for the IEMOCAP dataset, the default hyper-parameters are consistent across both missing protocols, with a learning rate of 1e-4, batch size of 24, VIB loss weight α = 0.5, and hidden dimension d = 30.