Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
A Closer Look at Multimodal Representation Collapse
Authors: Abhra Chaudhuri, Anjan Dutta, Tu Bui, Serban Georgescu
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on multiple multimodal benchmarks validate our theoretical claims. Project page: https: //abhrac.github.io/mmcollapse/. |
| Researcher Affiliation | Collaboration | 1Fujitsu Research of Europe 2University of Surrey. Correspondence to: Abhra Chaudhuri <EMAIL>. |
| Pseudocode | No | The paper describes an algorithm called Explicit Basis Reallocation (EBR) in Section 3.4, detailing the optimization criteria and updates using mathematical notation. However, it does not present this as a formally structured pseudocode block or algorithm figure. |
| Open Source Code | No | The abstract mentions a 'Project page: https: //abhrac.github.io/mmcollapse/'. This is a high-level project overview page and not a direct link to a source-code repository, nor does the text explicitly state that the code for the described methodology is released or available. |
| Open Datasets | Yes | We choose the MIMIC-IV (Johnson et al., 2023) and av MNIST (Vielzeuf et al., 2018) datasets for our experiments. |
| Dataset Splits | Yes | For MIMIC-IV, we follow the same settings as (Wu et al., 2024) and that of (Wang et al., 2023; Ma et al., 2021) for av MNIST. [...] To evaluate our approach, we adopt the experimental setup of MUSE (Wu et al., 2024), following which we mask out the modalities in the MIMIC-IV dataset with probabilities {0.1, 0.2, 0.3, 0.4, 0.7}. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as specific GPU or CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions using 'Tian et al. (2020) as our cross-modal knowledge distillation (KD) algorithm of choice applied on top of MUSE (Wu et al., 2024)', but it does not specify version numbers for any software libraries, frameworks, or tools. |
| Experiment Setup | Yes | The two hidden layers of ψ have output dimensionalities 512 and 256 respectively. The hidden layers of h have output dimensionalities 1024 and 512 respectively, whereas that of h 1 is 512 and 1024. The model was trained for 1200 epochs, with an initial learning rate of 0.01, decayed at a rate of 0.9 every 100 epochs. We interleave between the optimization of Lmd and Lsem every 10 epochs. |