reproducibilityindex.ai

On the Surrogate Gap between Contrastive and Supervised Losses

Authors: Han Bao, Yoshihiro Nagano, Kento Nozawa

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify that our theory is consistent with experiments on synthetic, vision, and language datasets.
Researcher Affiliation	Academia	1The University of Tokyo, Tokyo, Japan 2RIKEN AIP, Tokyo, Japan. Correspondence to: Han Bao (currently with Kyoto University) <bao@i.kyoto-u.ac.jp>.
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	The experimental codes to reproduce all figures in the paper are available at https://github.com/nzw0301/gap-contrastive-and-supervised-losses.
Open Datasets	Yes	We used the same datasets as Arora et al. (2019): CIFAR-100 (Krizhevsky, 2009) and Wiki-3029 (Arora et al., 2019) datasets, along with CIFAR-10 (Krizhevsky, 2009) dataset.
Dataset Splits	Yes	We treated 10% training samples as a validation dataset by sampling class uniformly. We used the original test dataset for testing. ... we split the dataset into 70%/10%/20% train/validation/test datasets, respectively.
Hardware Specification	Yes	We implemented our experimental code by using Py Torch (Paszke et al., 2019) s distributed data-parallel training (Li et al., 2020) on 8 NVIDIA A100 GPUs provided by the internal cluster.
Software Dependencies	Yes	We used Adam (Kingma & Ba, 2015) optimizer... provided by Py Torch (Paszke et al., 2019). ... We also used scikit-learn (Pedregosa et al., 2011) ... matplotlib (Hunter, 2007) and seaborn (Waskom, 2021) via pandas (Reback et al., 2020) ... hydra (Yadan, 2019) and experimental results using Weights & Biases (Biewald, 2020). For effective parallelized execution of our experimental codes, we use GNU Parallel (Tange, 2021).
Experiment Setup	Yes	We used Adam (Kingma & Ba, 2015) optimizer with the weight decay of coefficient 0.01 to all parameters. The mini-batch size was set to B = 1 024 and the number of epochs was 300. The learning rate was set to 0.01 with Reduce LROn Plateau scheduler (patience: 10 epochs)... We used LARC (You et al., 2017) optimizer wrapping the momentum SGD, whose momentum term was 0.9. We applied weights decay of coefficient 10 4 to all parameters except for all bias terms and batch norm s parameters. The base learning rate was initialized at lr B, where lr {2, 4, 6} 1/64 and mini-batch size B = 1 024... The number of epochs was 2 000.