Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Asymmetric Reinforcing Against Multi-Modal Representation Bias
Authors: Xiyuan Gao, Bing Cao, Pengfei Zhu, Nannan Wang, Qinghua Hu
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments validated our superiority on various multimodal classification datasets against the SOTAs. Comparison with Imbalanced Multimodal Learning Methods In this section, we compared ARM with advanced imbalanced multimodal learning methods to answer Q1: How does ARM narrow the modality contribution gap? Fig. 3 illustrates the trend of narrowing contribution gaps across different methods. Table 1 further reinforces this conclusion. ARM consistently outperforms other state-of-the-art methods, i.e., Greedy (Wu et al. 2022), OGM-GE (Peng et al. 2022), QMF (Zhang et al. 2023), PMR (Fan et al. 2023), Samplevaluation, Modality-valuation (Wei et al. 2024), and MLA (Zhang et al. 2024), achieving the competitive accuracy scores of 66.52% and 75.60%, respectively. Table 4 further validates these observations with an ablation study. |
| Researcher Affiliation | Academia | 1College of Intelligence and Computing, Tianjin University, Tianjin, 300000, China 2The State Key Laboratory of Integrated Services Networks, Xidian University, Xi an, 710000, China EMAIL, EMAIL |
| Pseudocode | No | The paper describes the proposed method using mathematical formulations and descriptive text (e.g., equations for MI, CMI, and loss functions), but it does not include a distinct block explicitly labeled as "Pseudocode" or "Algorithm" with structured steps. |
| Open Source Code | No | The paper states: "More details of implementation and experiment analysis are provided in the Appendix." However, it does not contain an explicit statement about releasing the source code or provide a link to a code repository. |
| Open Datasets | Yes | Kinetic Sounds (KS) (Arandjelovic and Zisserman 2017) is a specifically designed action recognition dataset for research in audio-visual learning... UCF-51 is a subset of UCF-101 (Soomro, Zamir, and Shah 2012)... UPMC Food-101 (Wang et al. 2015) is a comprehensive dataset for food recognition |
| Dataset Splits | Yes | UPMC Food-101 (Wang et al. 2015) is a comprehensive dataset for food recognition, consisting of 101,000 images accompanied by corresponding texts across 101 food categories. Each category includes 750 images for training and 250 images for testing. |
| Hardware Specification | Yes | The experiments are conducted on Huawei Atlas 800 Training Server with CANN and NVIDIA 4090 GPU. |
| Software Dependencies | No | Unless otherwise specified, Res Net-18 is used as the backbone in the experiments and trained from scratch. Encoders used for UCF-51 are Image Net pre-trained. For Food-101, a Vi T-based model is used as the vision encoder, and a BERT-based model is used as the text encoder by the pre-trained. During training, we use Stochastic Gradient Descent (SGD)... The experiments are conducted on Huawei Atlas 800 Training Server with CANN and NVIDIA 4090 GPU. The paper mentions various models (ResNet-18, ViT, BERT), optimizers (SGD), and frameworks (CANN), but does not provide specific version numbers for any of these software components. |
| Experiment Setup | Yes | Unless otherwise specified, Res Net-18 is used as the backbone in the experiments and trained from scratch. Encoders used for UCF-51 are Image Net pre-trained. For Food-101, a Vi T-based model is used as the vision encoder, and a BERT-based model is used as the text encoder by the pre-trained. Before modality valuation, a warm-up stage is employed for all experiments. During training, we use Stochastic Gradient Descent (SGD) with a batch size of 64. We set the initial learning rate, weight decay, and momentum parameters to 10 3, 5 10 4, and 0.9, respectively. |