Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning
Authors: Mayee Chen, Daniel Y Fu, Avanika Narayan, Michael Zhang, Zhao Song, Kayvon Fatahalian, Christopher Re
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate THANOS on two tasks designed to evaluate how well it preserves subclasses: Coarse-to-fine transfer learning trains a model to classify superclasses but use the representations to distinguish subclasses. THANOS outperforms Sup Con by 11.1 points on average across 5 standard datasets. Worst-group robustness evaluates how well a model can identify underperforming sub-groups and maintain high performance on them. THANOS identifies underperforming sub-groups 7.7 points better than previous work (Sohoni et al., 2020) and achieves 4.7 points of lift on worst-group robustness across 3 datasets, setting state-of-the-art on Celeb A by 11.5 points. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Stanford University 2Adobe Research. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/HazyResearch/thanos-code/. |
| Open Datasets | Yes | We use coarse versions of CIFAR10, CIFAR100, MNIST, and Tiny Image Net to study coarse-to-fine transfer. We use Waterbirds, ISIC and Celeb A for robustness (Sagawa et al., 2019; Codella et al., 2019; Liu et al., 2015; Sohoni et al., 2020). |
| Dataset Splits | No | The paper describes the datasets and training process but does not explicitly state the train/validation/test dataset splits used for reproduction, nor does it refer to specific predefined splits with proper citation for that split. |
| Hardware Specification | Yes | All transfer experiments were run using Tesla V100 machines. |
| Software Dependencies | No | We use the implementation in Py Torch Lightning Bolts2 (Falcon & Cho, 2020). |
| Experiment Setup | Yes | For the coarse dataset training, all models were trained for 600 epochs with an initial learning rate of 0.0003, a cosine annealing learning rate scheduler with Tmax set to 100 and the Adam W optimizer. A dropout rate of 0.05 was used. We did not use weight decay. All experiments were run using a batch size of 128 for both training and evaluation. |