VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
Authors: Adrien Bardes, Jean Ponce, Yann LeCun
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the representations obtained after self-supervised pretraining of a Res Net50 He et al. (2016) backbone with VICReg during 1000 epochs, on the training set of Image Net, using the training protocol described in section 4. We also pretrain on pairs of image and text data and evaluate on retrieval tasks on the MS-COCO dataset. |
| Researcher Affiliation | Collaboration | Adrien Bardes1,2 Jean Ponce2,4 Yann Le Cun1,3,4 1Facebook AI Research 2Inria, École normale supérieure, CNRS, PSL Research University 3Courant Institute, New York University 4Center for Data Science, New York University |
| Pseudocode | Yes | Algorithm 1: VICReg pytorch pseudocode. |
| Open Source Code | No | The paper references third-party libraries (e.g., VISSL, detectron2) that have open-source code, but it does not contain an explicit statement or link confirming the release of the authors' own VICReg implementation code for this paper. |
| Open Datasets | Yes | Implementation details for pretraining with VICReg on the 1000-classes Imaget Net dataset without labels are as follows. Evaluation of the representations obtained with a Res Net-50 backbone pretrained with VICReg on: (1) linear classification on top of the frozen representations from Image Net; (2) semi-supervised classification on top of the fine-tuned representations from 1% and 10% of Image Net samples. We also pretrain on pairs of image and text data and evaluate on retrieval tasks on the MS-COCO dataset. |
| Dataset Splits | Yes | We compare in Table 1 our results on both tasks against other methods on the validation set of Image Net. We use the standard split of ESC-50 Piczak (2015), composed of 1600 training audio samples and 400 validation sample. |
| Hardware Specification | Yes | All methods are run by us on 32 Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions software like "Py Torch", "VISSL library", "detectron2 library", and "LIBLINEAR" but does not provide specific version numbers for these components. |
| Experiment Setup | Yes | Coefficients λ and µ are 25 and ν is 1 in Eq. (6), and ϵ is 0.0001 in Eq. (1). The encoder network fθ is a standard Res Net-50 backbone He et al. (2016) with 2048 output units. The expander hφ is composed of two fully-connected layers with batch normalization (BN) Ioffe & Szegedy (2015) and Re LU, and a third linear layer. The sizes of all 3 layers were set to 8192. The training protocol follows those of BYOL and Barlow Twins: LARS optimizer You et al. (2017); Goyal et al. (2017) run for 1000 epochs with a weight decay of 10 6 and a learning rate lr = batch_size/256 base_lr, where batch_size is set to 2048 by default and base_lr is a base learning rate set to 0.2. The learning rate follows a cosine decay schedule Loshchilov & Hutter (2017), starting from 0 with 10 warmup epochs and with final value of 0.002. |