reproducibilityindex.ai

Robust Fine-tuning of Zero-shot Models via Variance Reduction

Authors: Beier Zhu, Jiequan Cui, Hanwang Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On Image Net and five derived distribution shifts, our VRF further improves the OOD accuracy by 1.5 2.0 pp over the ensemble baselines while maintaining or increasing ID accuracy. VRF achieves similar large robustness gains (0.9 3.1 pp) on other distribution shifts benchmarks.
Researcher Affiliation	Academia	Beier Zhu Jiequan Cui Hanwang Zhang Nanyang Technological University beier002@e.ntu.edu.sg, hanwangzhang@ntu.edu.sg
Pseudocode	Yes	Algorithm 1 Variation Reduction Fine-tuning
Open Source Code	Yes	Codes are available in https://github.com/Beier Zhu/VRF.
Open Datasets	Yes	CIFAR-10 [13]: MIT License, https://www.cs.toronto.edu/~kriz/cifar.html. STL-10 [2]: Non-commercial, https://cs.stanford.edu/~acoates/stl10/. Entity-30 [23]: Non-commercial, https://github.com/Madry Lab/BREEDS-Benchmarks. Image Net [3]: Non-commercial, http://image-net.org. IN-V2 [21]: MIT License, https://github.com/modestyachts/Image Net V2. IN-R [7]: MIT License, https://github.com/hendrycks/imagenet-r. IN-Sketch [27]: MIT License, https://github.com/Haohan Wang/Image Net-Sketch. IN-A [9]: MIT License, https://github.com/hendrycks/natural-adv-examples. Object Net [1]: Creative Commons Attribution 4.0, https://objectnet.dev.
Dataset Splits	Yes	Note that all the hyperparameters, e.g., α, a, b, are searched using the accuracy on the in-distribution (ID) validation set. Derived distribution shift datasets are only for evaluation and not for hyperparameter sweeps.
Hardware Specification	Yes	The batch size for training CLIP Vi T-16 based LP-FT models is set to 384, which is the largest batch size that fits into 2 A6000 GPUs.
Software Dependencies	No	When fine-tuning E2E-FT models, we adhere to Wortsman et al. [28], employing the default Py Torch Adam W optimizer for 10 epochs with weight decay of 0.1 and a cosine-annealing learning rate schedule with 500 warm-up steps. Unless specified, we use a learning rate of 3 10 5, gradient clipping at norm 1. When fine-tuning LP-FT, we first adopt the settings of Wortsman et al. [28] to train the linear classifier, then full fine-tune the models at a learning rate of 1 10 5. For efficiently performing k-NN search, we use Faiss library [11]. No specific version numbers for PyTorch, AdamW, or Faiss are explicitly stated.
Experiment Setup	Yes	When fine-tuning E2E-FT models, we adhere to Wortsman et al. [28], employing the default Py Torch Adam W optimizer for 10 epochs with weight decay of 0.1 and a cosine-annealing learning rate schedule with 500 warm-up steps. Unless specified, we use a learning rate of 3 10 5, gradient clipping at norm 1.