Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Asymmetric Reinforcing Against Multi-Modal Representation Bias

Authors: Xiyuan Gao, Bing Cao, Pengfei Zhu, Nannan Wang, Qinghua Hu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments validated our superiority on various multimodal classification datasets against the SOTAs. Comparison with Imbalanced Multimodal Learning Methods In this section, we compared ARM with advanced imbalanced multimodal learning methods to answer Q1: How does ARM narrow the modality contribution gap? Fig. 3 illustrates the trend of narrowing contribution gaps across different methods. Table 1 further reinforces this conclusion. ARM consistently outperforms other state-of-the-art methods, i.e., Greedy (Wu et al. 2022), OGM-GE (Peng et al. 2022), QMF (Zhang et al. 2023), PMR (Fan et al. 2023), Samplevaluation, Modality-valuation (Wei et al. 2024), and MLA (Zhang et al. 2024), achieving the competitive accuracy scores of 66.52% and 75.60%, respectively. Table 4 further validates these observations with an ablation study.
Researcher Affiliation Academia 1College of Intelligence and Computing, Tianjin University, Tianjin, 300000, China 2The State Key Laboratory of Integrated Services Networks, Xidian University, Xi an, 710000, China EMAIL, EMAIL
Pseudocode No The paper describes the proposed method using mathematical formulations and descriptive text (e.g., equations for MI, CMI, and loss functions), but it does not include a distinct block explicitly labeled as "Pseudocode" or "Algorithm" with structured steps.
Open Source Code No The paper states: "More details of implementation and experiment analysis are provided in the Appendix." However, it does not contain an explicit statement about releasing the source code or provide a link to a code repository.
Open Datasets Yes Kinetic Sounds (KS) (Arandjelovic and Zisserman 2017) is a specifically designed action recognition dataset for research in audio-visual learning... UCF-51 is a subset of UCF-101 (Soomro, Zamir, and Shah 2012)... UPMC Food-101 (Wang et al. 2015) is a comprehensive dataset for food recognition
Dataset Splits Yes UPMC Food-101 (Wang et al. 2015) is a comprehensive dataset for food recognition, consisting of 101,000 images accompanied by corresponding texts across 101 food categories. Each category includes 750 images for training and 250 images for testing.
Hardware Specification Yes The experiments are conducted on Huawei Atlas 800 Training Server with CANN and NVIDIA 4090 GPU.
Software Dependencies No Unless otherwise specified, Res Net-18 is used as the backbone in the experiments and trained from scratch. Encoders used for UCF-51 are Image Net pre-trained. For Food-101, a Vi T-based model is used as the vision encoder, and a BERT-based model is used as the text encoder by the pre-trained. During training, we use Stochastic Gradient Descent (SGD)... The experiments are conducted on Huawei Atlas 800 Training Server with CANN and NVIDIA 4090 GPU. The paper mentions various models (ResNet-18, ViT, BERT), optimizers (SGD), and frameworks (CANN), but does not provide specific version numbers for any of these software components.
Experiment Setup Yes Unless otherwise specified, Res Net-18 is used as the backbone in the experiments and trained from scratch. Encoders used for UCF-51 are Image Net pre-trained. For Food-101, a Vi T-based model is used as the vision encoder, and a BERT-based model is used as the text encoder by the pre-trained. Before modality valuation, a warm-up stage is employed for all experiments. During training, we use Stochastic Gradient Descent (SGD) with a batch size of 64. We set the initial learning rate, weight decay, and momentum parameters to 10 3, 5 10 4, and 0.9, respectively.