Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How

Authors: Sebastian Pineda Arango, Fabio Ferreira, Arlind Kadra, Frank Hutter, Josif Grabocka

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To this aim, we evaluated over 20k hyperparameter configurations for finetuning 24 pretrained image classification models on 87 datasets to generate a large-scale meta-dataset. We empirically demonstrate that our resulting approach can quickly select an accurate pretrained model for a new dataset together with its optimal hyperparameters.
Researcher Affiliation Academia Sebastian Pineda Arango, Fabio Ferreira, Arlind Kadra, Frank Hutter & Josif Grabocka Department of Computer Science University of Freiburg pineda@cs.uni-freiburg.de
Pseudocode Yes We formalize these steps in Algorithm 1, where we use the validation loss as a performance metric. The entire procedure is sped up by starting from a meta-learned surrogate as described in Section 4.4. Algorithm 1: Quick-Tune Algorithm. We present the procedure for meta-training the cost and loss predictors in Algorithm 2. Algorithm 2: Meta-training Algorithm.
Open Source Code Yes To facilitate reproducibility, we open-source our code and release our meta-dataset.1. 1https://github.com/releaunifreiburg/Quick Tune
Open Datasets Yes In our experiments, we use the tasks contained in the Meta-Album benchmark (Ullah et al., 2022) since it contains a diverse set of computer vision datasets.
Dataset Splits Yes When searching for a pipeline on datasets of a given fold Di, we consider one of the remaining folds for meta-validation and the remaining ones for meta-training.
Hardware Specification Yes The configurations trained on the tasks from micro, mini, extended are finetuned for 1, 4, and 16 hours respectively, using a single NVIDIA Ge Force RTX 2080 Ti GPU per finetuning task, amounting to a total compute time of 32 GPU months.
Software Dependencies No We base our study on the timm library (Wightman, 2019)... We use the Synetune library (Salinas et al., 2022) for the implementation of the baselines. The paper mentions software libraries like 'timm' and 'Synetune' but does not specify their version numbers.
Experiment Setup Yes The chosen setup uses an MLP with 2 hidden layers and 32 neurons per layer, for both predictors. We use the Adam optimizer with a learning rate of 10 4 for fitting the estimators during the BO steps. We update their parameters for 100 epochs for every iteration from Algorithm 1. Further details on the set-up are specified in Appendix A.2. Table 7: Detailed Search Space for Curve Generation.