MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance

Authors: Yake Wei, Di Hu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, experiments across multiple types of modalities and frameworks with dense cross-modal interaction indicate our superior and extendable method performance. Our method is also expected to facilitate multi-task cases with a clear discrepancy in task difficulty, demonstrating its ideal scalability. The source code and dataset are available at https://github.com/Ge Wu-Lab/ MMPareto_ICML2024.
Researcher Affiliation Academia 1Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 2Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China. Correspondence to: Di Hu <dihu@ruc.edu.cn>.
Pseudocode Yes The overall algorithm is shown in Algorithm 1
Open Source Code Yes The source code and dataset are available at https://github.com/Ge Wu-Lab/ MMPareto_ICML2024.
Open Datasets Yes CREMA-D (Cao et al., 2014) is an audio-visual dataset for emotion recognition, including 7,442 video clips... Kinetics Sounds (Arandjelovic & Zisserman, 2017) is an audio-visual dataset containing 31 human action classes... Colored-and-gray-MNIST (Kim et al., 2019) is a synthetic dataset based on MNIST (Le Cun et al., 1998)... Model Net40 (Wu et al., 2015) is a dataset with 3D objects, covering 40 categories.
Dataset Splits Yes Model Net40 (Wu et al., 2015) is a dataset with 3D objects, covering 40 categories. It contains 9,483 training samples and 2,468 test samples.
Hardware Specification Yes All models are trained on 2 NVIDIA RTX 3090 (Ti).
Software Dependencies No The paper mentions using "SGD with momentum (0.9)" but does not provide specific version numbers for software libraries, frameworks (like PyTorch or TensorFlow), or other dependencies.
Experiment Setup Yes During the training, we use SGD with momentum (0.9) and γ = 1.5 in experiments. More details are provided in Appendix B. [...] we use SGD with momentum (0.9) and set the learning rate at 1e-3.