reproducibilityindex.ai

A Simple Framework for Contrastive Learning of Visual Representations

Authors: Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Most of our study for unsupervised pretraining (learning encoder network f without labels) is done using the Image Net ILSVRC-2012 dataset (Russakovsky et al., 2015). Some additional pretraining experiments on CIFAR-10 (Krizhevsky & Hinton, 2009) can be found in Appendix B.9. We also test the pretrained results on a wide range of datasets for transfer learning. To evaluate the learned representations, we follow the widely used linear evaluation protocol (Zhang et al., 2016; Oord et al., 2018; Bachman et al., 2019), where a linear classiﬁer is trained on top of the frozen base network, and test accuracy is used as a proxy for representation quality.
Researcher Affiliation	Industry	Ting Chen 1 Simon Kornblith 1 Mohammad Norouzi 1 Geoffrey Hinton 1 1Google Research, Brain Team. Correspondence to: Ting Chen <iamtingchen@google.com>.
Pseudocode	Yes	Algorithm 1 Sim CLR s main learning algorithm.
Open Source Code	Yes	1Code available at https://github.com/google-research/simclr.
Open Datasets	Yes	Most of our study for unsupervised pretraining (learning encoder network f without labels) is done using the Image Net ILSVRC-2012 dataset (Russakovsky et al., 2015). Some additional pretraining experiments on CIFAR-10 (Krizhevsky & Hinton, 2009) can be found in Appendix B.9.
Dataset Splits	Yes	Following Kornblith et al. (2019), we perform hyperparameter tuning for each model-dataset combination and select the best hyperparameters on a validation set.
Hardware Specification	Yes	We train our model with Cloud TPUs, using 32 to 128 cores depending on the batch size.2 With 128 TPU v3 cores, it takes 1.5 hours to train our Res Net-50 with a batch size of 4096 for 100 epochs.
Software Dependencies	No	The information is insufficient. The paper mentions the use of 'LARS optimizer' and 'Res Net' architecture, but does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	Default setting. Unless otherwise speciﬁed, for data augmentation we use random crop and resize (with random ﬂip), color distortions, and Gaussian blur (for details, see Appendix A). We use Res Net-50 as the base encoder network, and a 2-layer MLP projection head to project the representation to a 128-dimensional latent space. As the loss, we use NT-Xent, optimized using LARS with learning rate of 4.8 (= 0.3 Batch Size/256) and weight decay of 10 6. We train at batch size 4096 for 100 epochs.3 Furthermore, we use linear warmup for the ﬁrst 10 epochs, and decay the learning rate with the cosine decay schedule without restarts (Loshchilov & Hutter, 2016).