EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
Authors: Chung-Yiu Yau, Hoi To Wai, Parameswaran Raman, Soumajyoti Sarkar, Mingyi Hong
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments validate that EMC2 is effective with small batch training and achieves comparable or better performance than baseline algorithms. We report the results for pre-training image encoders on STL-10 and Imagenet-100. |
| Researcher Affiliation | Collaboration | 1Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong SAR of China. The work of C.-Y. Yau was done while interning at Amazon Web Services. 2Amazon Web Services, USA. 3Department of Electrical and Computer Engineering, University of Minnesota, USA. M. Hong holds concurrent appointments as an Amazon Scholar and as a faculty at University of Minnesota. This paper describes his work performed at Amazon. Correspondence to: Chung-Yiu Yau <cyyau@se.cuhk.edu.hk>. |
| Pseudocode | Yes | Algorithm 1 Efficient MCMC Negative Sampling Method for Contrastive Learning (EMC2) |
| Open Source Code | Yes | The code used in the experiments are available at https://github.com/amazon-science/ contrastive_emc2. |
| Open Datasets | Yes | We concentrate on two common datasets under this setup STL-10 and Imagenet-100. Table 2: Datasets attributes includes STL-10 and Imagenet-100. |
| Dataset Splits | No | The paper discusses metrics like 'linear probe (LP) accuracy' and '1-nearest-neighbor (1-NN) accuracy' and mentions 'test accuracy' in figures, but does not explicitly provide details on train/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | Yes | Note that in this setup, negative cache algorithm uses four Tesla T4 GPUs for training and refreshing the negative cache while the other algorithms run on one Tesla T4 GPU. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'LARS optimizer' but does not specify software versions (e.g., 'PyTorch 1.9', 'TensorFlow 2.5'). |
| Experiment Setup | Yes | In Table 3, we list the hyperparameter values adopted in our experiments. Table 3: Dataset, Model, Inverse Temp. β, Batch Size b, Learning Rate γ, Feature Dim. d, Weight Decay, Cache Refresh ρ (Negative Cache), Burn-in Steps P (EMC2) |