reproducibilityindex.ai

Barlow Twins: Self-Supervised Learning via Redundancy Reduction

Authors: Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, Stephane Deny

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	BARLOW TWINS outperforms previous methods on Image Net for semi-supervised classiﬁcation in the low-data regime, and is on par with current state of the art for Image Net classiﬁcation with a linear classiﬁer head, and for transfer tasks of classiﬁcation and object detection.
Researcher Affiliation	Collaboration	1Facebook AI Research 2New York University, NY, USA. Correspondence to: Jure Zbontar <jzb@fb.com>, Li Jing <ljng@fb.com>, Ishan Misra <imisra@fb.com>, Yann Le Cun <yann@fb.com>, St ephane Deny <stephane.deny.pro@gmail.com>.
Pseudocode	Yes	The pseudocode for BARLOW TWINS is shown as Algorithm 1.
Open Source Code	Yes	Code and pre-trained models (in Py Torch) are available at https://github.com/facebookresearch/barlowtwins
Open Datasets	Yes	Our network is pretrained using self-supervised learning on the training set of the Image Net ILSVRC-2012 dataset (Deng et al., 2009) (without labels).
Dataset Splits	Yes	The top-1 and top-5 accuracies obtained on the Image Net validation set are reported in Table 1.
Hardware Specification	Yes	Training is distributed across 32 V100 GPUs and takes approximately 124 hours.
Software Dependencies	No	The paper mentions 'Py Torch-style pseudocode' and that 'Code and pre-trained models (in Py Torch) are available', indicating the use of PyTorch. However, it does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	We use the LARS optimizer (You et al., 2017) and train for 1000 epochs with a batch size of 2048. We use a learning rate of 0.2 for the weights and 0.0048 for the biases and batch normalization parameters. We multiply the learning rate by the batch size and divide it by 256. We use a learning rate warm-up period of 10 epochs, after which we reduce the learning rate by a factor of 1000 using a cosine decay schedule (Loshchilov & Hutter, 2016). We ran a search for the trade-off parameter λ of the loss function and found the best results for λ = 5 × 10−3. We use a weight decay parameter of 1.5 × 10−6.