reproducibilityindex.ai

Robustness in Multimodal Learning under Train-Test Modality Mismatch

Authors: Brandon Mckinzie, Vaishaal Shankar, Joseph Yitan Cheng, Yinfei Yang, Jonathon Shlens, Alexander T Toshev

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a multimodal robustness framework to provide a systematic analysis of common multimodal representation learning methods. Further, we identify robustness shortcomings of these approaches and propose two intervention techniques leading to 1.5 -4 robustness improvements on three datasets, Audio Set, Kinetics-400 and Image Net-Captions.
Researcher Affiliation	Industry	1Apple ML Research 2Work done while at Apple 3Apple. Correspondence to: Alexander Toshev <toshev@apple.com>.
Pseudocode	No	The paper describes algorithms and models in text and diagrams (e.g., Figure 3 for MASD) but does not include any explicit pseudocode blocks or algorithm listings.
Open Source Code	No	The paper does not contain any explicit statements or links indicating the release of open-source code for the described methodology.
Open Datasets	Yes	We focus our experiments on representation learning with the Audio Set dataset (Gemmeke et al., 2017)... Additionally, we explore the generality of our results on Kinetics-400 (Kay et al., 2017) and Image Net-Captions (Fang et al., 2022a).
Dataset Splits	Yes	Audio Set consists of an unbalanced training set of 1,743,790 examples, used as unlabeled pretraining data; a training and evaluation sets of 18,649 and 17,065 examples respectively used for the downstream task.
Hardware Specification	No	The paper mentions training on specific datasets and using certain models (e.g., ViT-B/16 architecture) but does not specify any hardware details such as GPU models, CPU types, or cloud computing instances used for the experiments.
Software Dependencies	No	The paper mentions using specific optimizers (AdamW) and models (CLIP, VATT, MAE) but does not provide specific version numbers for any software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Table 2. Training hyperparameters used for pretraining, linear probing, and finetuning. Config Pretraining Linear Probing Finetuning Contr. MAE Contr. MAE Contr. MAE global batch 1024 1024 256 128 128 64 learning rate 8e-4 8e-4 1e-2 1e-2 1e-4 1e-4 LR warmup 1000 2000 200 200 1000 2000 epochs 32 256 360 360 30 60 optimizer Adam W Adam W Adam W Adam W Adam W Adam W