Contrasting Multiple Representations with the Multi-Marginal Matching Gap

Authors: Zoe Piran, Michal Klein, James Thornton, Marco Cuturi

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate improved performance over multiview extensions of pairwise losses, for both self-supervised and multimodal tasks.
Researcher Affiliation Collaboration 1Apple 2Hebrew University Jerusalem.
Pseudocode Yes Algorithm 1 Multi-marginal Sinkhorn (MM-S)
Open Source Code No The paper states 'To perform experiments, we implemented the multi-marginal Sinkhorn algorithm (Alg. 1) in Py Torch Paszke et al. (2019)' but does not provide an explicit statement or link for the release of their own M3G implementation code. The GitHub link provided is for a codebase they reused, not necessarily their own.
Open Datasets Yes We test the M3G loss in an SSL setting (Image Net-1k) and two multimodal tasks (Domain Net and Physio Net)., citing (Deng et al., 2009), (Peng et al., 2019), and (Goldberger et al., 2000; Ghassemi et al., 2018; Kemp et al., 2000) respectively.
Dataset Splits Yes We consider a domain adaptation (DA) task, where the goal is to learn a common encoder, followed by one or multiple classifiers, using labeled data from multiple domains. We quantify the generalization power of this pre-trained encoder with a classification task, tested on data coming from a new, completely unseen domain... We pick one domain that acts as the unseen modality, and train representations on the k = 5 remaining domains. and The train data contains segmented samples of 994 individuals, and the evaluation dataset, Sleep EDFx (Goldberger et al., 2000; Kemp et al., 2000), contains 153 nights of sleep recordings from 78 individuals.
Hardware Specification Yes All results are given for the same per GPU batch size (n = 64), 300 epochs, ε = 0.2 for M3G, run on 4 nodes of 8 A100 GPUs.
Software Dependencies No The paper mentions implementing parts in 'Py Torch Paszke et al. (2019)' but does not provide specific version numbers for PyTorch or other software dependencies.
Experiment Setup Yes In Table A1 we provide the hyperparameters used to train Image Net-1k and Domain Net models. ... Batch size 2048 (Image Net-1k) 512 (Domain Net)... Training duration (epochs) 300... Optimizer Adam W... Base learning rate 6.5 10 4... Per GPU Batch size 64 (Image Net-1k) 16 (Domain Net).