reproducibilityindex.ai

Transferring Knowledge From Large Foundation Models to Small Downstream Models

Authors: Shikai Qiu, Boran Han, Danielle C. Maddix, Shuai Zhang, Bernie Wang, Andrew Gordon Wilson

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Across multiple vision, language, and multi-modal datasets, AFT achieves significantly better downstream performance compared to alternatives with a similar computational cost.
Researcher Affiliation	Collaboration	1AWS AI Labs, Santa Clara, CA, USA 2Department of Computer Science, New York University, NYC, USA
Pseudocode	Yes	Algorithm 1 Adaptive Feature Transfer (AFT)
Open Source Code	Yes	Our code is available at https://github.com/amazon-science/adaptive-feature-transfer.
Open Datasets	Yes	on CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), Oxford Flowers-102 (Nilsback & Zisserman, 2008), Oxford-IIIT Pets (Parkhi et al., 2012), Describable Textures Dataset (DTD) (Cimpoi et al., 2014) and Food-101 (Bossard et al., 2014) datasets.
Dataset Splits	Yes	We tune the hyperparameter β for AFT, KD, and B-Tuning in all experiments by holding out 10% of the original training set and selecting the β value that yields the highest accuracy on this holdout set.
Hardware Specification	Yes	Table 1 compares the runtime on an NVIDIA A100 GPU for training Vi T-S/16 (22M parameters) for one epoch on CIFAR-100...
Software Dependencies	No	The paper mentions using 'timm' and 'Hugging Face implementation' for models but does not provide specific version numbers for these or other software libraries or dependencies.
Experiment Setup	Yes	We use the Adam optimizer in all experiments and train for 5000 steps (rounded up to whole epochs) with a batch size of 128 and a cosine lr decay schedule. We use a base learning rate of 1e 4 for Vi T-S/16 and MLP Mixer-B, and 1e 3 for Res Net-50.