On the duality between contrastive and non-contrastive self-supervised learning
Authors: Quentin Garrido, Yubei Chen, Adrien Bardes, Laurent Najman, Yann LeCun
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | While we have discussed how close sample and dimension contrastive methods are in theory, one of the primary considerations when choosing or designing a method is the performance on downstream tasks. Linear classification on Image Net has been the main focus in most SSL methods, so we will focus on this task. We will consider the two following aspects, which are responsible for most of the discrepancies between methods. |
| Researcher Affiliation | Collaboration | Quentin Garrido1,2 Yubei Chen1,5 Adrien Bardes1,3 Laurent Najman2 Yann Le Cun1,4,5 1Meta AI FAIR 2Univ Gustave Eiffel, CNRS, LIGM, F-77454 Marne-la-Vall ee, France 3Inria, Ecole normale sup erieure, CNRS, PSL Research University 4Courant Institute, New York University 5Center for Data Science, New York University |
| Pseudocode | Yes | In order to reproduce our main figure, we also give the numerical performance in table S5. All of this should make our results reproducible, and, more importantly, should make it so that practitioners can benefit from the improved performance that we introduce. ... L VICREG VARIATIONS PSEUDOCODE Algorithm 1: VICReg-exp Py Torch pseudocode. |
| Open Source Code | Yes | While our pretrainings are very costly, each taking around a day with 8 V100 GPUs, we provide complete hyperparameter values in table S6. They are compatible with official implementations of the losses, and for VICReg-ctr and VICReg-exp we also provide Py Torch pseudocode in supplementary section L. |
| Open Datasets | Yes | We even see a small increase in top-1 accuracy on Image Net (Deng et al., 2009) with linear evaluation when using Sim CLR-abs, where we reach 68.71% top-1 accuracy, compared to 68.61% with our improved reproduction of Sim CLR. |
| Dataset Splits | Yes | Linear classification on Image Net has been the main focus in most SSL methods, so we will focus on this task. ... The performance of this online classifier correlates almost perfectly with its offline counterpart, so we can rely on it to discuss the general behaviors of various methods. This evaluation was briefly mentioned in Chen et al. (2020a) but without experimental support. |
| Hardware Specification | Yes | Each experiment was run on 8 Nvidia V100 GPUs, with 32GB of memory each, and took around 24 hours to complete. |
| Software Dependencies | No | The paper mentions "Py Torch pseudocode" but does not specify version numbers for PyTorch, Python, or any other libraries or software dependencies. |
| Experiment Setup | Yes | For training, we follow common procedure and use a Res Net-50 backbone (He et al., 2016), with the LARS (You et al., 2017) optimizer. We use by default a base learning rate of 0.3 and compute the effective learning rate as lr = base lr batch size / 256. We also use a momentum of 0.9 and weight decay of 10^-6. The learning rate follows a cosine annealing schedule after a 10-epoch linear warmup. We train for 100 epochs in all of our experiments. For data augmentation, we follow the protocol of BYOL (Grill et al., 2020) which is as follows Table S1: Image augmentation parameters, taken from (Grill et al., 2020). ... confer supplementary section K for the exact hyperparameters used for each experiment. |