How to Fine-Tune Vision Models with SGD

Authors: Ananya Kumar, Ruoqi Shen, Sebastien Bubeck, Suriya Gunasekar

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We examine 7 popular models, including vision transformers (Dosovitskiy et al., 2021; Caron et al., 2021; Radford et al., 2021), Conv Ne Xts (Liu et al., 2022), and Res Nets (Kolesnikov et al., 2020; He et al., 2016), of different sizes and pretraining modalities. When pretrained on a large corpus and then fine-tuned, these models achieve near state-of-the-art performance on downstream benchmarks. In addition to good transfer learning, we also want our fine-tuned models to handle practical distribution shifts gracefully. So we focus on 5 distribution shift datasets that have both in-distribution (ID) and out-of-distribution (OOD) evaluations: WILDS-FMo W, WILDS-Camelyon, Waterbirds, BREEDS-Living-17, Domain Net. These were selected to capture different types of data shifts (subpopulation shifts, spurious correlations, style shifts), including two real world shifts in medical imaging and satellite remote sensing from the WILDS benchmark (Koh et al., 2021).
Researcher Affiliation Collaboration Ananya Kumar Ruoqi Shen, ananya@cs.stanford.edu shenr3@cs.washington Sébastien Bubeck Suriya Gunasekar sebubeck@microsoft.com suriyag@microsoft.com
Pseudocode No The paper describes the update rules for SGD and Adam W using equations and textual explanations (e.g., 'For SGD, in addition to gradients gt and weights θ(t), we maintain first order momentum estimate mt as optimizer state...'), but it does not include any formal pseudocode blocks or algorithm listings.
Open Source Code No The paper references third-party tools like 'Open CLIP (Ilharco et al., 2021)' and provides its URL, but it does not provide a specific link or explicit statement confirming the release of its own source code for the methodology or experiments described in the paper.
Open Datasets Yes We focus on 5 distribution shift datasets that have both in-distribution (ID) and out-of-distribution (OOD) evaluations: WILDS-FMo W, WILDS-Camelyon, Waterbirds, BREEDS-Living-17, Domain Net. These were selected to capture different types of data shifts (subpopulation shifts, spurious correlations, style shifts), including two real world shifts in medical imaging and satellite remote sensing from the WILDS benchmark (Koh et al., 2021).
Dataset Splits Yes For all datasets we follow prior work (Kumar et al., 2022) and pick the best learning rate and early stop based on the ID validation accuracy.
Hardware Specification Yes The profiling was done using Weights and Biases on a Titan-X GPU with micro-batch size of 1.
Software Dependencies No The paper mentions using 'Py Torch' for SGD and Adam W implementations, 'sklearn library' for logistic regression, and 'Weights and Biases' for profiling, but it does not specify any version numbers for these software components (e.g., PyTorch 1.x, scikit-learn 0.xx).
Experiment Setup Yes For each method, we train for the same number of epochs using a cosine learning rate schedule, and sweep over 6 starting learning rates (ensuring that the optimal learning rate is in the middle of the sweep). For all datasets we follow prior work (Kumar et al., 2022) and pick the best learning rate and early stop based on the ID validation accuracy. See Appendix A for additional details.