Fine-Tuning is Fine, if Calibrated

Authors: Zheda Mai, Arpita Chowdhury, Ping Zhang, Cheng-Hao Tu, Hong-You Chen, Vardaan Pahuja, Tanya Berger-Wolf, Song Gao, Charles Stewart, Yu Su, Wei-Lun (Harry) Chao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct an extensive empirical study to demonstrate the robustness of our findings and provide preliminary explanations underlying them, suggesting new directions for future theoretical analysis. Our code is available at https://github.com/OSU-MLB/Fine-Tuning-Is-Fine-If-Calibrated.
Researcher Affiliation Academia Zheda Mai1 , Arpita Chowdhury1 , Ping Zhang1 , Cheng-Hao Tu1, Hong-You Chen1, Vardaan Pahuja1, Tanya Berger-Wolf1, Song Gao2, Charles Stewart3, Yu Su1, Wei-Lun Chao1. 1The Ohio State University, 2University of Wisconsin Madison, 3 Rensselaer Polytechnic Institute.
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. It includes mathematical equations and derivations but no structured algorithmic descriptions.
Open Source Code Yes Our code is available at https://github.com/OSU-MLB/Fine-Tuning-Is-Fine-If-Calibrated.
Open Datasets Yes We focus on two of the largest datasets used in [49]. We also consider the Image Net Distribution Shift benchmark widely used in out-of-distribution (OOD) generalization [57]. 1. Office-Home [53]: a domain adaptation dataset with 65 classes and 4 domains. ... 2. VTAB [64]: a set of 19 visual recognition datasets. ... 3. Image Net-R [15] and Image Net-S [54]: datasets for OOD detection and generalization [57].
Dataset Splits No Office-Home: Within each downstream domain, each class is randomly split into training and testing sets following a 7:3 split. 30 classes are randomly selected as fine-tuning classes; the remaining 35 classes are absent classes. and Image Net-Variants: Each class is randomly divided into training and testing sets following an 8:2 split. 50% of the classes are randomly selected as fine-tuning classes (100 for Image Net-R and 500 for Image Net-S), with the remainder as absent. The paper describes train and test sets but does not specify a separate 'validation' set split for model tuning in the standard sense. It mentions 'Pseudo cross-validation' for parameter selection, but that's a method, not a fixed dataset split for overall model evaluation.
Hardware Specification Yes We use a combination of NVIDIA RTX A6000 and NVIDIA 2080Ti GPUs. Since we worked on fine-tuning, the computation is quite manageable.
Software Dependencies No The paper mentions various models (ResNet-50, ViT-B/32, CLIP) and optimizers (SGD, Adam, Ada Belief, Adadelta, Ada Grad, RMSprop) but does not provide specific version numbers for any software dependencies or libraries used.
Experiment Setup Yes For the Image Net-Variants benchmark, we use an Image Net-1K pre-trained Res Net-50 (results in the main paper) and Vi T-B/32 (results in the appendix) as pre-trained models. The pre-trained model is fine-tuned on downstream tasks for 50 epochs using the SGD optimizer with a learning rate 1e-3, momentum 0.9, weight decay 1e-4, and batch size 64.