reproducibilityindex.ai

What Makes for Good Views for Contrastive Learning?

Authors: Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, Phillip Isola

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We use theoretical and empirical analysis to better understand the importance of view selection, and argue that we should reduce the mutual information (MI) between views while keeping task-relevant information intact. To verify this hypothesis, we devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI. We also consider data augmentation as a way to reduce MI, and show that increasing data augmentation indeed leads to decreasing MI and improves downstream classiﬁcation accuracy. As a byproduct, we achieve a new state-of-the-art accuracy on unsupervised pre-training for Image Net classiﬁcation (73% top-1 linear readout with a Res Net-50)1.
Researcher Affiliation	Collaboration	Yonglong Tian MIT CSAIL Chen Sun Google, Brown University Ben Poole Google Research Dilip Krishnan Google Research Cordelia Schmid Google Research Phillip Isola MIT CSAIL
Pseudocode	No	The paper does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	No	The paper mentions a 'Project page: http://hobbitlong.github.io/InfoMin' but does not explicitly state that source code is provided or link directly to a source-code repository within the text.
Open Datasets	Yes	As a byproduct, we achieve a new state-of-the-art accuracy on unsupervised pre-training for Image Net classiﬁcation (73% top-1 linear readout with a Res Net-50)1. We experiment with RGB and YDb Dr. Experiments are conducted on STL-10, which includes 100k unlabeled and 5k labeled images. After contrastive training stage, we evaluate on STL-10 and CIFAR-10 by freezing the encoder and training a linear classiﬁer. segmentation performance on NYU-V2 [40] images. We build our toy dataset by combining Moving MNIST [51] (consisting of videos where digits move inside a black canvas with constant speed and bounce off of image boundaries), with a ﬁxed background image sampled from the STL-10 dataset [10]. Motivated by the Info Min principle, we propose a new set of data augmentation, called Info Min Aug. In combination of the Jig Saw strategy proposed in PIRL [38], our Info Min Aug achieves 73.0% top-1 accuracy on Image Net linear readout benchmark with Res Net-50. transferring our unsupervisedly pre-trained models to PASCAL VOC object detection and COCO instance segmentation consistently outperforms supervised Image Net pre-training.
Dataset Splits	No	The paper mentions using well-known datasets like ImageNet, STL-10, and CIFAR-10, and states counts for STL-10 (100k unlabeled and 5k labeled), but does not explicitly provide specific training, validation, or test split percentages, absolute counts for all splits, or reference predefined splits with explicit citations within the provided text.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for running its experiments, such as GPU models, CPU models, or memory specifications.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) required to replicate the experiments.
Experiment Setup	Yes	Table 1: Single-crop Image Net accuracies (%) of linear classiﬁers [63] trained on representations learned with different contrastive methods using Res Net-50 [24]. Info Min Aug. refers to data augmentation using Random Resized Crop, Color Jittering, Gaussian Blur, Rand Augment, Color Dropping, and a Jig Saw branch as in PIRL [38]. Info Min Aug. (Ours) Res Net-50 24 MLP 200 70.1 89.4 Info Min Aug. (Ours) Res Net-50 24 MLP 800 73.0 91.1. We create views by randomly cropping two patches of size 64x64 from the same image with various offsets. Practically, the ﬂow-based model g is restricted to pixel-wise 1x1 convolutions and Re LU activations, operating independently on each pixel. We try both volume preserving (VP) and non-volume preserving (NVP) ﬂows.