Conditional Negative Sampling for Contrastive Learning of Visual Representations

Authors: Mike Wu, Milan Mosse, Chengxu Zhuang, Daniel Yamins, Noah Goodman

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, we find our approach, applied on top of existing models (IR, CMC, and Mo Co) improves accuracy by 2-5% absolute points in each case, measured by linear evaluation on four standard image benchmarks.
Researcher Affiliation Academia Department of Computer Science1, Psychology2, and Philosophy3 Stanford University {wumike, chengxuz, mmosse19, yamins, ngoodman}@stanford.edu
Pseudocode Yes Algorithm 1: Mo Co Ring
Open Source Code No The paper mentions using and adapting code from third-party tools like Detectron2 and PyTorch Lightning, but does not provide a link or explicit statement about the availability of their own source code for the Conditional Negative Sampling method.
Open Datasets Yes We explore our method applied to IR, CMC, and Mo Co in four commonly used visual datasets. ... The results for CIFAR10, CIFAR100, STL10, and Image Net are in Table 1. ... suite of image datasets from the Meta Dataset collection (Triantafillou et al., 2019).
Dataset Splits Yes As in prior work (Wu et al., 2018; Zhuang et al., 2019; He et al., 2019; Misra & Maaten, 2020; H enaff et al., 2019; Kolesnikov et al., 2019; Donahue & Simonyan, 2019; Bachman et al., 2019; Tian et al., 2019; Chen et al., 2020a), we evaluate each method by linear classification on frozen embeddings. That is, we optimize a contrastive objective on a pretraining dataset to learn a representation; then, using a transfer dataset, we fit logistic regression on representations only.
Hardware Specification Yes We used a single Titan X GPU with 8 CPU workers, and Py Torch Lightning (Falcon et al., 2019).
Software Dependencies No The paper mentions "Py Torch Lightning (Falcon et al., 2019)", but it does not specify a version number for PyTorch Lightning or any other software dependencies.
Experiment Setup Yes We pick the upper percentile u = 10 and the lower percentile = 1 although we anneal u starting from 100. We resize input images to be 256 by 256 pixels, and normalize them using dataset mean and standard deviation. The temperature is set to 0.07. ... We use a Res Net-18 encoder with a output dimension of 128. ... In pretraining, we use SGD with learning rate 0.03, momentum 0.9 and weight decay 1e-4 for 300 epochs and batch size 256 (128 for CMC). We drop the learning rate twice by a factor of 10 on epochs 200 and 250. In transfer, we use SGD with learning rate 0.01, momentum 0.9, and no weight decay for 100 epochs without dropping learning rate.