Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities

Authors: Adriel Saporta, Aahlad Manas Puli, Mark Goldstein, Rajesh Ranganath

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that Symile outperforms pairwise CLIP on cross-modal classification and retrieval across several experiments including on a multilingual dataset of images, text and audio of over 33M examples and a clinical dataset of chest X-rays, electrocardiograms, and laboratory measurements. We show that Symile retains its advantage over pairwise CLIP even with modalities missing in the data. We publicly release both the multilingual and the clinical datasets, which are specifically designed to test a model s ability to capture higher-order information between three distinct high-dimensional data types.
Researcher Affiliation Academia Adriel Saporta Aahlad Puli Mark Goldstein Rajesh Ranganath New York University
Pseudocode Yes Algorithm 1 Pseudocode for implementation of Symile with O(N) negative sampling
Open Source Code Yes All datasets and code used in this work are publicly available at https://github.com/rajesh-lab/symile.
Open Datasets Yes All datasets and code used in this work are publicly available at https://github.com/rajesh-lab/symile.
Dataset Splits Yes For each of the three datasets, 10M training, 500K validation, and 500K test samples were generated. ... We split our dataset (11,622 admissions) into a train/validation development set (95% of patients) and a test set (5% of patients), ensuring there is no patient overlap across the splits.
Hardware Specification Yes Experiments were conducted with 16 CPUs, 200GB of RAM, and a single NVIDIA A100 80GB PCIe GPU.
Software Dependencies No The paper mentions software like AdamW optimizer [32], Whisper [41] (Hugging Face model id openai/whisper-large-v3), CLIP [40] (Hugging Face model id openai/clip-vit-large-patch14), and XLM-RoBERTa [13] (Hugging Face model id xlm-roberta-large). While some specific models are referenced, explicit Python library versions (e.g., PyTorch 1.9) are not provided, making it difficult to fully reproduce the software environment.
Experiment Setup Yes For all experiments, we use the Adam W optimizer [32]. Following [40], the temperature parameter τ is directly optimized during training as a multiplicative scalar to avoid the need for separate hyperparameter tuning. ... Both Symile and CLIP are trained for 100 epochs using a batch size of 1000, a learning rate of 0.1, and a weight decay of 0.01. The learned temperature parameter τ is initialized to 0.3. The Symile loss is trained with O(N) negative sampling.