Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications

Authors: Paul Pu Liang, Chun Kai Ling, Yun Cheng, Alexander Obolenskiy, Yudong Liu, Rohan Pandey, Alex Wilf, Louis-Philippe Morency, Russ Salakhutdinov

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate these estimated bounds and show how they accurately track true interactions. Finally, we show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks. To validate our bounds, we experiment on both synthetic and large real-world datasets with varying amounts of interactions.
Researcher Affiliation Academia 1Carnegie Mellon University, 2Columbia University, 3Princeton University
Pseudocode No The paper describes algorithms and derivations in prose and mathematical notation but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We believe these results shed light on the intriguing connections between multimodal interactions, modality disagreement, and model performance, and release our code and models at https://github.com/pliang279/PID.
Open Datasets Yes We use a collection of 10 real-world datasets from Multi Bench (Liang et al., 2021) which add up to a size of more than 700,000 datapoints.
Dataset Splits No The paper mentions a "test split" ("For the test split, we use unsupervised clustering...") but does not provide specific details on how the training and validation sets are partitioned (e.g., percentages, sample counts, or explicit standard splits).
Hardware Specification Yes All training is done on TPU v2-8 accelerators, with continuous pretraining taking 30 minutes and using up to 9GB of memory.
Software Dependencies No The paper mentions using Python for analysis and cites various libraries in its references, but it does not specify exact version numbers for any key software components or libraries used in their implementation.
Experiment Setup Yes We continuously pretrain and then finetune a pretrained MERLOT Reserve Base model on the datasets with a batch size of 8. During pretraining, we train the model for 960 steps with a learning rate of 0.0001, and no warm-up steps, and use the defaults for other hyperparameters.