A Simple Framework for Contrastive Learning of Visual Representations
Authors: Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Most of our study for unsupervised pretraining (learning encoder network f without labels) is done using the Image Net ILSVRC-2012 dataset (Russakovsky et al., 2015). Some additional pretraining experiments on CIFAR-10 (Krizhevsky & Hinton, 2009) can be found in Appendix B.9. We also test the pretrained results on a wide range of datasets for transfer learning. To evaluate the learned representations, we follow the widely used linear evaluation protocol (Zhang et al., 2016; Oord et al., 2018; Bachman et al., 2019), where a linear classifier is trained on top of the frozen base network, and test accuracy is used as a proxy for representation quality. |
| Researcher Affiliation | Industry | Ting Chen 1 Simon Kornblith 1 Mohammad Norouzi 1 Geoffrey Hinton 1 1Google Research, Brain Team. Correspondence to: Ting Chen <iamtingchen@google.com>. |
| Pseudocode | Yes | Algorithm 1 Sim CLR s main learning algorithm. |
| Open Source Code | Yes | 1Code available at https://github.com/google-research/simclr. |
| Open Datasets | Yes | Most of our study for unsupervised pretraining (learning encoder network f without labels) is done using the Image Net ILSVRC-2012 dataset (Russakovsky et al., 2015). Some additional pretraining experiments on CIFAR-10 (Krizhevsky & Hinton, 2009) can be found in Appendix B.9. |
| Dataset Splits | Yes | Following Kornblith et al. (2019), we perform hyperparameter tuning for each model-dataset combination and select the best hyperparameters on a validation set. |
| Hardware Specification | Yes | We train our model with Cloud TPUs, using 32 to 128 cores depending on the batch size.2 With 128 TPU v3 cores, it takes 1.5 hours to train our Res Net-50 with a batch size of 4096 for 100 epochs. |
| Software Dependencies | No | The information is insufficient. The paper mentions the use of 'LARS optimizer' and 'Res Net' architecture, but does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | Default setting. Unless otherwise specified, for data augmentation we use random crop and resize (with random flip), color distortions, and Gaussian blur (for details, see Appendix A). We use Res Net-50 as the base encoder network, and a 2-layer MLP projection head to project the representation to a 128-dimensional latent space. As the loss, we use NT-Xent, optimized using LARS with learning rate of 4.8 (= 0.3 Batch Size/256) and weight decay of 10 6. We train at batch size 4096 for 100 epochs.3 Furthermore, we use linear warmup for the first 10 epochs, and decay the learning rate with the cosine decay schedule without restarts (Loshchilov & Hutter, 2016). |