reproducibilityindex.ai

Fine-Tuning is Fine, if Calibrated

Authors: Zheda Mai, Arpita Chowdhury, Ping Zhang, Cheng-Hao Tu, Hong-You Chen, Vardaan Pahuja, Tanya Berger-Wolf, Song Gao, Charles Stewart, Yu Su, Wei-Lun (Harry) Chao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct an extensive empirical study to demonstrate the robustness of our findings and provide preliminary explanations underlying them, suggesting new directions for future theoretical analysis. Our code is available at https://github.com/OSU-MLB/Fine-Tuning-Is-Fine-If-Calibrated.
Researcher Affiliation	Academia	Zheda Mai1 , Arpita Chowdhury1 , Ping Zhang1 , Cheng-Hao Tu1, Hong-You Chen1, Vardaan Pahuja1, Tanya Berger-Wolf1, Song Gao2, Charles Stewart3, Yu Su1, Wei-Lun Chao1. 1The Ohio State University, 2University of Wisconsin Madison, 3 Rensselaer Polytechnic Institute.
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks. It includes mathematical equations and derivations but no structured algorithmic descriptions.
Open Source Code	Yes	Our code is available at https://github.com/OSU-MLB/Fine-Tuning-Is-Fine-If-Calibrated.
Open Datasets	Yes	We focus on two of the largest datasets used in [49]. We also consider the Image Net Distribution Shift benchmark widely used in out-of-distribution (OOD) generalization [57]. 1. Office-Home [53]: a domain adaptation dataset with 65 classes and 4 domains. ... 2. VTAB [64]: a set of 19 visual recognition datasets. ... 3. Image Net-R [15] and Image Net-S [54]: datasets for OOD detection and generalization [57].
Dataset Splits	No	Office-Home: Within each downstream domain, each class is randomly split into training and testing sets following a 7:3 split. 30 classes are randomly selected as fine-tuning classes; the remaining 35 classes are absent classes. and Image Net-Variants: Each class is randomly divided into training and testing sets following an 8:2 split. 50% of the classes are randomly selected as fine-tuning classes (100 for Image Net-R and 500 for Image Net-S), with the remainder as absent. The paper describes train and test sets but does not specify a separate 'validation' set split for model tuning in the standard sense. It mentions 'Pseudo cross-validation' for parameter selection, but that's a method, not a fixed dataset split for overall model evaluation.
Hardware Specification	Yes	We use a combination of NVIDIA RTX A6000 and NVIDIA 2080Ti GPUs. Since we worked on fine-tuning, the computation is quite manageable.
Software Dependencies	No	The paper mentions various models (ResNet-50, ViT-B/32, CLIP) and optimizers (SGD, Adam, Ada Belief, Adadelta, Ada Grad, RMSprop) but does not provide specific version numbers for any software dependencies or libraries used.
Experiment Setup	Yes	For the Image Net-Variants benchmark, we use an Image Net-1K pre-trained Res Net-50 (results in the main paper) and Vi T-B/32 (results in the appendix) as pre-trained models. The pre-trained model is fine-tuned on downstream tasks for 50 epochs using the SGD optimizer with a learning rate 1e-3, momentum 0.9, weight decay 1e-4, and batch size 64.