MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance
Authors: Yake Wei, Di Hu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, experiments across multiple types of modalities and frameworks with dense cross-modal interaction indicate our superior and extendable method performance. Our method is also expected to facilitate multi-task cases with a clear discrepancy in task difficulty, demonstrating its ideal scalability. The source code and dataset are available at https://github.com/Ge Wu-Lab/ MMPareto_ICML2024. |
| Researcher Affiliation | Academia | 1Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 2Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China. Correspondence to: Di Hu <dihu@ruc.edu.cn>. |
| Pseudocode | Yes | The overall algorithm is shown in Algorithm 1 |
| Open Source Code | Yes | The source code and dataset are available at https://github.com/Ge Wu-Lab/ MMPareto_ICML2024. |
| Open Datasets | Yes | CREMA-D (Cao et al., 2014) is an audio-visual dataset for emotion recognition, including 7,442 video clips... Kinetics Sounds (Arandjelovic & Zisserman, 2017) is an audio-visual dataset containing 31 human action classes... Colored-and-gray-MNIST (Kim et al., 2019) is a synthetic dataset based on MNIST (Le Cun et al., 1998)... Model Net40 (Wu et al., 2015) is a dataset with 3D objects, covering 40 categories. |
| Dataset Splits | Yes | Model Net40 (Wu et al., 2015) is a dataset with 3D objects, covering 40 categories. It contains 9,483 training samples and 2,468 test samples. |
| Hardware Specification | Yes | All models are trained on 2 NVIDIA RTX 3090 (Ti). |
| Software Dependencies | No | The paper mentions using "SGD with momentum (0.9)" but does not provide specific version numbers for software libraries, frameworks (like PyTorch or TensorFlow), or other dependencies. |
| Experiment Setup | Yes | During the training, we use SGD with momentum (0.9) and γ = 1.5 in experiments. More details are provided in Appendix B. [...] we use SGD with momentum (0.9) and set the learning rate at 1e-3. |