Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How
Authors: Sebastian Pineda Arango, Fabio Ferreira, Arlind Kadra, Frank Hutter, Josif Grabocka
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To this aim, we evaluated over 20k hyperparameter configurations for finetuning 24 pretrained image classification models on 87 datasets to generate a large-scale meta-dataset. We empirically demonstrate that our resulting approach can quickly select an accurate pretrained model for a new dataset together with its optimal hyperparameters. |
| Researcher Affiliation | Academia | Sebastian Pineda Arango, Fabio Ferreira, Arlind Kadra, Frank Hutter & Josif Grabocka Department of Computer Science University of Freiburg pineda@cs.uni-freiburg.de |
| Pseudocode | Yes | We formalize these steps in Algorithm 1, where we use the validation loss as a performance metric. The entire procedure is sped up by starting from a meta-learned surrogate as described in Section 4.4. Algorithm 1: Quick-Tune Algorithm. We present the procedure for meta-training the cost and loss predictors in Algorithm 2. Algorithm 2: Meta-training Algorithm. |
| Open Source Code | Yes | To facilitate reproducibility, we open-source our code and release our meta-dataset.1. 1https://github.com/releaunifreiburg/Quick Tune |
| Open Datasets | Yes | In our experiments, we use the tasks contained in the Meta-Album benchmark (Ullah et al., 2022) since it contains a diverse set of computer vision datasets. |
| Dataset Splits | Yes | When searching for a pipeline on datasets of a given fold Di, we consider one of the remaining folds for meta-validation and the remaining ones for meta-training. |
| Hardware Specification | Yes | The configurations trained on the tasks from micro, mini, extended are finetuned for 1, 4, and 16 hours respectively, using a single NVIDIA Ge Force RTX 2080 Ti GPU per finetuning task, amounting to a total compute time of 32 GPU months. |
| Software Dependencies | No | We base our study on the timm library (Wightman, 2019)... We use the Synetune library (Salinas et al., 2022) for the implementation of the baselines. The paper mentions software libraries like 'timm' and 'Synetune' but does not specify their version numbers. |
| Experiment Setup | Yes | The chosen setup uses an MLP with 2 hidden layers and 32 neurons per layer, for both predictors. We use the Adam optimizer with a learning rate of 10 4 for fitting the estimators during the BO steps. We update their parameters for 100 epochs for every iteration from Algorithm 1. Further details on the set-up are specified in Appendix A.2. Table 7: Detailed Search Space for Curve Generation. |