Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations

Authors: Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis, Luc V Gool

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results show that an approach like Mo Co [19] works surprisingly well across: (i) objectversus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets. Second, given the generality of the approach, we try to realize further gains with minor modifications. We show that learning additional invariances through the use of multi-scale cropping, stronger augmentations and nearest neighbors improves the representations. Finally, we observe that Mo Co learns spatially structured representations when trained with a multi-crop strategy.
Researcher Affiliation Academia 1 KU Leuven/ESAT-PSI 2 ETH Zurich/CVL
Pseudocode Yes Algorithm 1 Pseudocode for k NN-Mo Co
Open Source Code Yes The code and models are available 2. Footnote 2: Code: https://github.com/wvangansbeke/Revisiting-Contrastive-SSL
Open Datasets Yes We train Mo Co-v2 [7] on a variety of datasets. Table 1 shows an overview. The representations are evaluated on six downstream tasks: linear classification, semantic segmentation, object detection, video instance segmentation and depth estimation. We adopt the following target datasets for linear classification: CIFAR10 [27], Food-101 [26], Pets [35], Places365 [56], Stanford Cars [26], SUN397 [48] and VOC 2007 [16]. The semantic segmentation task is evaluated on Cityscapes [10], PASCAL VOC [16] and NYUD [40]. We use PASCAL VOC [16] for object detection. The DAVIS2017 benchmark [37] is used for video instance segmentation. Finally, depth estimation is performed on NYUD [40].
Dataset Splits No The paper mentions using 'train splits' for some datasets (e.g., 'The complete train splits are used for COCO and BDD100K.') and evaluates on various benchmark datasets. While standard practice for these benchmarks often includes validation sets, the paper does not explicitly provide specific percentages, sample counts, or detailed methodology for splitting data into training, validation, and test sets for its own experiments.
Hardware Specification No The paper describes training parameters like 'pretrained for 400 epochs using batches of size 256', but it does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper mentions 'Py Torch' in a footnote ('We use the Random Resized Crop in Py Torch...'), but it does not provide specific version numbers for PyTorch or any other software dependencies required to replicate the experiments.
Experiment Setup Yes The model, i.e. a Res Net-50 backbone, is pretrained for 400 epochs using batches of size 256. The initial learning rate is set to 0.3 and decayed using a cosine schedule. We use the default values for the temperature (τ = 0.2) and momentum (m = 0.999) hyperparameters. Further, we consider two additional modifications. First, we enforce the smaller crops to overlap with the anchor image... Second, since increasing the number of views facilitates faster training, we reduce the momentum hyperparameter m from 0.999 to 0.995. We pretrain for 200 epochs on COCO.