reproducibilityindex.ai

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations

Authors: Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis, Luc V Gool

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results show that an approach like Mo Co [19] works surprisingly well across: (i) objectversus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-speciﬁc datasets. Second, given the generality of the approach, we try to realize further gains with minor modiﬁcations. We show that learning additional invariances through the use of multi-scale cropping, stronger augmentations and nearest neighbors improves the representations. Finally, we observe that Mo Co learns spatially structured representations when trained with a multi-crop strategy.
Researcher Affiliation	Academia	1 KU Leuven/ESAT-PSI 2 ETH Zurich/CVL
Pseudocode	Yes	Algorithm 1 Pseudocode for k NN-Mo Co
Open Source Code	Yes	The code and models are available 2. Footnote 2: Code: https://github.com/wvangansbeke/Revisiting-Contrastive-SSL
Open Datasets	Yes	We train Mo Co-v2 [7] on a variety of datasets. Table 1 shows an overview. The representations are evaluated on six downstream tasks: linear classiﬁcation, semantic segmentation, object detection, video instance segmentation and depth estimation. We adopt the following target datasets for linear classiﬁcation: CIFAR10 [27], Food-101 [26], Pets [35], Places365 [56], Stanford Cars [26], SUN397 [48] and VOC 2007 [16]. The semantic segmentation task is evaluated on Cityscapes [10], PASCAL VOC [16] and NYUD [40]. We use PASCAL VOC [16] for object detection. The DAVIS2017 benchmark [37] is used for video instance segmentation. Finally, depth estimation is performed on NYUD [40].
Dataset Splits	No	The paper mentions using 'train splits' for some datasets (e.g., 'The complete train splits are used for COCO and BDD100K.') and evaluates on various benchmark datasets. While standard practice for these benchmarks often includes validation sets, the paper does not explicitly provide specific percentages, sample counts, or detailed methodology for splitting data into training, validation, and test sets for its own experiments.
Hardware Specification	No	The paper describes training parameters like 'pretrained for 400 epochs using batches of size 256', but it does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper mentions 'Py Torch' in a footnote ('We use the Random Resized Crop in Py Torch...'), but it does not provide specific version numbers for PyTorch or any other software dependencies required to replicate the experiments.
Experiment Setup	Yes	The model, i.e. a Res Net-50 backbone, is pretrained for 400 epochs using batches of size 256. The initial learning rate is set to 0.3 and decayed using a cosine schedule. We use the default values for the temperature (τ = 0.2) and momentum (m = 0.999) hyperparameters. Further, we consider two additional modiﬁcations. First, we enforce the smaller crops to overlap with the anchor image... Second, since increasing the number of views facilitates faster training, we reduce the momentum hyperparameter m from 0.999 to 0.995. We pretrain for 200 epochs on COCO.