EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence

Authors: Chung-Yiu Yau, Hoi To Wai, Parameswaran Raman, Soumajyoti Sarkar, Mingyi Hong

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments validate that EMC2 is effective with small batch training and achieves comparable or better performance than baseline algorithms. We report the results for pre-training image encoders on STL-10 and Imagenet-100.
Researcher Affiliation Collaboration 1Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong SAR of China. The work of C.-Y. Yau was done while interning at Amazon Web Services. 2Amazon Web Services, USA. 3Department of Electrical and Computer Engineering, University of Minnesota, USA. M. Hong holds concurrent appointments as an Amazon Scholar and as a faculty at University of Minnesota. This paper describes his work performed at Amazon. Correspondence to: Chung-Yiu Yau <cyyau@se.cuhk.edu.hk>.
Pseudocode Yes Algorithm 1 Efficient MCMC Negative Sampling Method for Contrastive Learning (EMC2)
Open Source Code Yes The code used in the experiments are available at https://github.com/amazon-science/ contrastive_emc2.
Open Datasets Yes We concentrate on two common datasets under this setup STL-10 and Imagenet-100. Table 2: Datasets attributes includes STL-10 and Imagenet-100.
Dataset Splits No The paper discusses metrics like 'linear probe (LP) accuracy' and '1-nearest-neighbor (1-NN) accuracy' and mentions 'test accuracy' in figures, but does not explicitly provide details on train/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification Yes Note that in this setup, negative cache algorithm uses four Tesla T4 GPUs for training and refreshing the negative cache while the other algorithms run on one Tesla T4 GPU.
Software Dependencies No The paper mentions using 'Adam optimizer' and 'LARS optimizer' but does not specify software versions (e.g., 'PyTorch 1.9', 'TensorFlow 2.5').
Experiment Setup Yes In Table 3, we list the hyperparameter values adopted in our experiments. Table 3: Dataset, Model, Inverse Temp. β, Batch Size b, Learning Rate γ, Feature Dim. d, Weight Decay, Cache Refresh ρ (Negative Cache), Burn-in Steps P (EMC2)