reproducibilityindex.ai

The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning

Authors: Zhenmei Shi, Jiefeng Chen, Kunyang Li, Jayaram Raghuram, Xi Wu, Yingyu Liang, Somesh Jha

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our analysis and method empirically with systematic experiments using real-world datasets and foundation models.
Researcher Affiliation	Collaboration	1 University of Wisconsin-Madison 2 Google LLC 3 Xai Pient Equal contribution {zhmeishi,jiefeng,kli253,jayaramr,yliang,jha}@cs.wisc.edu, wuxi@google.com
Pseudocode	No	The paper describes methods and equations, but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Please refer to our released code1 for more details. 1https://github.com/zhmeishi/trade-off_contrastive_learning
Open Datasets	Yes	CIFAR-10 (Krizhevsky et al., 2009) dataset consists of 60,000 32 32 color images in 10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck. Each class has 6,000 images. There are 50,000 training images and 10,000 test images.
Dataset Splits	Yes	There are 50,000 training images and 10,000 test images. ... Then we fix the pre-trained feature extractor and train a linear classifier (Linear Probing, LP) on 1%, 5%, 10%, 20%, 100% of the labeled data from the downstream task.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments.
Software Dependencies	No	The paper mentions optimizers (SGD, Adam W) and learning rate schedulers, but does not provide specific software names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup	Yes	We pre-train a Res Net18 network (He et al., 2016) as a feature extractor under different contrastive learning methods using SGD for 800 epochs with a cosine learning-rate scheduler, the base learning rate of 0.06, weight decay 5e-4, momentum 0.9 and batch size 512. Then we fix the pre-trained feature extractor and train a linear classifier (Linear Probing, LP) on 1%, 5%, 10%, 20%, 100% of the labeled data from the downstream task. For LP we use SGD for 200 epochs with a cosine learning-rate scheduler, the base learning rate of 5.0, no weight decay, momentum 0.9, and batch size 256.