reproducibilityindex.ai

Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications

Authors: Paul Pu Liang, Chun Kai Ling, Yun Cheng, Alexander Obolenskiy, Yudong Liu, Rohan Pandey, Alex Wilf, Louis-Philippe Morency, Russ Salakhutdinov

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate these estimated bounds and show how they accurately track true interactions. Finally, we show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks. To validate our bounds, we experiment on both synthetic and large real-world datasets with varying amounts of interactions.
Researcher Affiliation	Academia	1Carnegie Mellon University, 2Columbia University, 3Princeton University
Pseudocode	No	The paper describes algorithms and derivations in prose and mathematical notation but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We believe these results shed light on the intriguing connections between multimodal interactions, modality disagreement, and model performance, and release our code and models at https://github.com/pliang279/PID.
Open Datasets	Yes	We use a collection of 10 real-world datasets from Multi Bench (Liang et al., 2021) which add up to a size of more than 700,000 datapoints.
Dataset Splits	No	The paper mentions a "test split" ("For the test split, we use unsupervised clustering...") but does not provide specific details on how the training and validation sets are partitioned (e.g., percentages, sample counts, or explicit standard splits).
Hardware Specification	Yes	All training is done on TPU v2-8 accelerators, with continuous pretraining taking 30 minutes and using up to 9GB of memory.
Software Dependencies	No	The paper mentions using Python for analysis and cites various libraries in its references, but it does not specify exact version numbers for any key software components or libraries used in their implementation.
Experiment Setup	Yes	We continuously pretrain and then finetune a pretrained MERLOT Reserve Base model on the datasets with a batch size of 8. During pretraining, we train the model for 960 steps with a learning rate of 0.0001, and no warm-up steps, and use the defaults for other hyperparameters.