Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases
Authors: Senthil Purushwalkam, Abhinav Gupta
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that approaches like MOCO[1] and PIRL[2] learn occlusion-invariant representations. However, they fail to capture viewpoint and category instance invariance which are crucial components for object recognition. Second, we demonstrate that these approaches obtain further gains from access to a clean object-centric training dataset like Imagenet. Finally, we propose an approach to leverage unstructured videos to learn representations that possess higher viewpoint invariance. Our results show that the learned representations outperform MOCOv2 trained on the same data in terms of invariances encoded and the performance on downstream image classification and semantic segmentation tasks. |
| Researcher Affiliation | Collaboration | Senthil Purushwalkam Carnegie Mellon University EMAIL Abhinav Gupta Carnegie Mellon University & Facebook AI Research EMAIL |
| Pseudocode | No | The paper describes methods and equations (e.g., Equation 1), but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'will publicly release the code to reproduce the invariance evaluation metrics on these datasets,' which is a future promise. The project webpage linked in the paper (http://www.cs.cmu.edu/~spurushw/publication/demystifyssl/) also states 'Code will be released soon.' |
| Open Datasets | Yes | We use the training set of the GOT-10K tracking dataset[35]... We use the PASCAL3D+ dataset[36]... The ALOI dataset[37] contains images of 1000 objects... Contrastive self-supervised approaches are most commonly trained on the Image Net dataset... We pretrain self-supervised models on the MSCOCO dataset[40]... We evaluate this baseline by training MOCOv2 on frames extracted from Tracking Net[41] videos... We also evaluate on the task of semantic segmentation on ADE20K[44]... |
| Dataset Splits | No | The paper mentions using various standard datasets like ImageNet, MSCOCO, and Pascal, and discusses training and evaluation. It mentions training on '118K MSCOCO images' or 'a randomly sampled 10% subset of Image Net,' but it does not explicitly provide exact percentages, absolute sample counts, or specific citations for train/validation/test splits for any of the datasets used. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components and frameworks like 'Res Net,' 'Linear SVMs,' and 'ROIPooling,' but it does not specify their version numbers or other crucial software dependencies required for replication. |
| Experiment Setup | No | The paper mentions 'τ is a hyperparameter called temperature' and refers to supplementary material for 'additional implementation details' and 'more concrete implementation details,' but it does not provide specific hyperparameter values, training configurations, or system-level settings within the main text. |