The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning
Authors: Zhenmei Shi, Jiefeng Chen, Kunyang Li, Jayaram Raghuram, Xi Wu, Yingyu Liang, Somesh Jha
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our analysis and method empirically with systematic experiments using real-world datasets and foundation models. |
| Researcher Affiliation | Collaboration | 1 University of Wisconsin-Madison 2 Google LLC 3 Xai Pient Equal contribution {zhmeishi,jiefeng,kli253,jayaramr,yliang,jha}@cs.wisc.edu, wuxi@google.com |
| Pseudocode | No | The paper describes methods and equations, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Please refer to our released code1 for more details. 1https://github.com/zhmeishi/trade-off_contrastive_learning |
| Open Datasets | Yes | CIFAR-10 (Krizhevsky et al., 2009) dataset consists of 60,000 32 32 color images in 10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck. Each class has 6,000 images. There are 50,000 training images and 10,000 test images. |
| Dataset Splits | Yes | There are 50,000 training images and 10,000 test images. ... Then we fix the pre-trained feature extractor and train a linear classifier (Linear Probing, LP) on 1%, 5%, 10%, 20%, 100% of the labeled data from the downstream task. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments. |
| Software Dependencies | No | The paper mentions optimizers (SGD, Adam W) and learning rate schedulers, but does not provide specific software names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x). |
| Experiment Setup | Yes | We pre-train a Res Net18 network (He et al., 2016) as a feature extractor under different contrastive learning methods using SGD for 800 epochs with a cosine learning-rate scheduler, the base learning rate of 0.06, weight decay 5e-4, momentum 0.9 and batch size 512. Then we fix the pre-trained feature extractor and train a linear classifier (Linear Probing, LP) on 1%, 5%, 10%, 20%, 100% of the labeled data from the downstream task. For LP we use SGD for 200 epochs with a cosine learning-rate scheduler, the base learning rate of 5.0, no weight decay, momentum 0.9, and batch size 256. |