What Should Not Be Contrastive in Contrastive Learning

Authors: Tete Xiao, Xiaolong Wang, Alexei A Efros, Trevor Darrell

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We train our model on the 100-category Image Net (IN-100) dataset... We test the models on various downstream datasets... Table 1: Classification accuracy... Table 2: Evaluation on multiple downstream tasks... Table 3: Evaluation on datasets of real-world corruptions... Ablation: Mo Co w/ all augmentations vs. Loo C.
Researcher Affiliation Academia Tete Xiao UC Berkeley Xiaolong Wang UC San Diego Alexei A. Efros UC Berkeley Trevor Darrell UC Berkeley
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code No No explicit statement regarding the release of source code for the described methodology or a link to a code repository was found.
Open Datasets Yes We train our model on the 100-category Image Net (IN-100) dataset, a subset of the Image Net (Deng et al., 2009) dataset, for fast ablation studies of the proposed framework. We split the subset following (Tian et al., 2019).
Dataset Splits Yes IN-100 validation set;... The i Naturalist 2019 (i Nat-1k) dataset (Van Horn et al., 2018)... We randomly reallocate 10% of training images into the validation set as the original validation set is relatively small.
Hardware Specification No No specific hardware details such as GPU models (e.g., NVIDIA A100), CPU models (e.g., Intel Core i7), or cloud instance types were mentioned for running experiments. The paper only refers to general training processes without specifying the computational infrastructure.
Software Dependencies No No specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'CUDA 11.1') were found in the paper.
Experiment Setup Yes We train the network for 500 epochs, and decrease the learning rate at 300 and 400 epochs. We use separate queues (He et al., 2020) for individual embedding space and set the queue size to 16,384. ...The batch size during training of the backbone and the linear layer is set to 256. ...We train the linear layer for 200 epochs for IN-100 and CUB-200, 100 epochs for i Nat-1k, optimized by momentum SGD with a learning rate of 30 decreased by 0.1 at 60% and 80% of training schedule; for Flowers-102 we train the linear layer with Adam optimizer for 250 iterations with a learning rate of 0.03.