Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding

Authors: Zhicheng Zhang, Wuyou Xia, Chenxi Zhao, Zhou Yan, Xiaoqiang Liu, Yongjie Zhu, Wenyu Qin, Pengfei Wan, Di Zhang, Jufeng Yang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on 21 benchmark datasets verify the effectiveness of MODA in perception, cognition, and emotion tasks.
Researcher Affiliation	Collaboration	Zhicheng Zhang 1 2 Wuyou Xia 1 Chenxi Zhao 1 Yan Zhou 3 Xiaoqiang Liu 3 Yongjie Zhu 3 Wenyu Qin 3 Pengfei Wan 3 Di Zhang 3 Jufeng Yang 1 2 1VCIP & TMCC & DISSec, College of Computer Science, Nankai University 2Pengcheng Laboratory 3Kuaishou Technology.
Pseudocode	No	The paper describes methods in paragraph text and mathematical formulations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Source code and demo are available in https://zzcheng.top/MODA.
Open Datasets	Yes	Perception: Following (Tong et al., 2024a), we conduct experiments on 4 types of perception task (i.e., general, knowledge, ocr, and vision-centric) across 16 benchmarks: MME (Fu et al., 2023), MMBench (Liu et al., 2025), SEED (Li et al., 2024), GQA (Hudson & Manning, 2019), Science QA (Lu et al., 2022), MMMU (Yue et al., 2024), Math Vista (Lu et al., 2024), AI2D (Kembhavi et al., 2016), Chart QA (Masry et al., 2022), OCRBench (Liu et al., 2024), Text VQA (Singh et al., 2019), Doc VQA (Mathew et al., 2021), MMVP (Tong et al., 2024b), Realworld QA (x AI, 2024), and CV-Bench (Tong et al., 2024a). Cognition: Following (Dai et al., 2025), we conduct experiments on MMRole to evaluate role-playing performance from 8 aspects. Emotion: Following (Yang et al., 2023; Huang et al., 2024), we conduct experiments on 4 benchmark datasets. MVSA-S and MVSA-M (Niu et al., 2016) are datasets used for sentiment polarity classification [...] Tum Emo (Yang et al., 2021) is a multimodal dataset [...] HFM (Liu et al., 2022) is a multimodal dataset.
Dataset Splits	No	The paper states, "For a fair comparison, all models are trained on 700K data samples for 1 epoch," and mentions using a batch size of 2048, but it does not specify explicit training, validation, or test dataset splits in terms of percentages, counts, or references to predefined splits.
Hardware Specification	No	The paper mentions using specific visual encoders (CLIP (ViT-L/14)) and foundational large language models (Llama-3-Instruct-8B, Hermes2-Yi-34B) but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for the experiments.
Software Dependencies	No	The paper mentions the use of the AdamW optimizer and foundational models like Llama-3-Instruct-8B and Hermes2-Yi-34B, but it does not specify version numbers for any key software components or libraries (e.g., Python, PyTorch, CUDA).
Experiment Setup	Yes	MODA is trained for 1 epoch with a batch size of 2048, using the AdamW (Loshchilov & Hutter, 2019) optimizer with a cosine learning rate schedule. The learning rate is set to 2e-5 for LLM and 2e-6 for visual encoder, respectively. The warmup rate is 0.03.