Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities
Authors: Adriel Saporta, Aahlad Manas Puli, Mark Goldstein, Rajesh Ranganath
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that Symile outperforms pairwise CLIP on cross-modal classification and retrieval across several experiments including on a multilingual dataset of images, text and audio of over 33M examples and a clinical dataset of chest X-rays, electrocardiograms, and laboratory measurements. We show that Symile retains its advantage over pairwise CLIP even with modalities missing in the data. We publicly release both the multilingual and the clinical datasets, which are specifically designed to test a model s ability to capture higher-order information between three distinct high-dimensional data types. |
| Researcher Affiliation | Academia | Adriel Saporta Aahlad Puli Mark Goldstein Rajesh Ranganath New York University |
| Pseudocode | Yes | Algorithm 1 Pseudocode for implementation of Symile with O(N) negative sampling |
| Open Source Code | Yes | All datasets and code used in this work are publicly available at https://github.com/rajesh-lab/symile. |
| Open Datasets | Yes | All datasets and code used in this work are publicly available at https://github.com/rajesh-lab/symile. |
| Dataset Splits | Yes | For each of the three datasets, 10M training, 500K validation, and 500K test samples were generated. ... We split our dataset (11,622 admissions) into a train/validation development set (95% of patients) and a test set (5% of patients), ensuring there is no patient overlap across the splits. |
| Hardware Specification | Yes | Experiments were conducted with 16 CPUs, 200GB of RAM, and a single NVIDIA A100 80GB PCIe GPU. |
| Software Dependencies | No | The paper mentions software like AdamW optimizer [32], Whisper [41] (Hugging Face model id openai/whisper-large-v3), CLIP [40] (Hugging Face model id openai/clip-vit-large-patch14), and XLM-RoBERTa [13] (Hugging Face model id xlm-roberta-large). While some specific models are referenced, explicit Python library versions (e.g., PyTorch 1.9) are not provided, making it difficult to fully reproduce the software environment. |
| Experiment Setup | Yes | For all experiments, we use the Adam W optimizer [32]. Following [40], the temperature parameter τ is directly optimized during training as a multiplicative scalar to avoid the need for separate hyperparameter tuning. ... Both Symile and CLIP are trained for 100 epochs using a batch size of 1000, a learning rate of 0.1, and a weight decay of 0.01. The learned temperature parameter τ is initialized to 0.3. The Symile loss is trained with O(N) negative sampling. |