Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning
Authors: Utku Evci, Vincent Dumoulin, Hugo Larochelle, Michael C Mozer
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In evaluations on the Visual Task Adaptation Benchmark (VTAB), Head2Toe matches performance obtained with fine-tuning on average while reducing training and storage cost a hundred fold or more, but critically, for out-of-distribution transfer, Head2Toe outperforms fine-tuning1. |
| Researcher Affiliation | Industry | 1Google Research, Brain Team. Correspondence to: Utku Evci <evcu@google.com>. |
| Pseudocode | No | The paper describes its method in prose and mathematical equations (e.g., Equation 1 and 2), but it does not include a clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps. |
| Open Source Code | Yes | We open source our code at https://github.com/ google-research/head2toe |
| Open Datasets | Yes | In our experiments, we use source models pretrained on Image Net2012 (Russakovsky et al., 2015)... Visual Task Adaptation Benchmark-1k (Zhai et al., 2019) to evaluate different methods. VTAB-1k consists of 19 different classification tasks, each having between 2 to 397 classes and a total of 1000 training examples. |
| Dataset Splits | Yes | We perform five-fold cross validation for each task and method in order to pick the best hyperparameters. We pick hyperparameters for each VTAB task separately by doing a 5-fold cross validation on the training data. |
| Hardware Specification | No | The paper does not explicitly state the specific hardware used for running its experiments, such as GPU models, CPU models, or cloud computing instance types. It refers to general training processes but lacks hardware specifications. |
| Software Dependencies | No | The paper refers to common model architectures like 'Res Net-50' and 'Vi T-B/16' but does not specify the software dependencies with version numbers (e.g., TensorFlow 2.x, PyTorch 1.x, Python 3.x) required to replicate the experiment. |
| Experiment Setup | Yes | All methods search over the same learning rates and training steps (two values of each). More details on hyperparameter selection and values used are shared in Appendix A. For HEAD2TOE we choose ℓ2,1 regularization coefficients from (0.001, 0.00001) and target feature sizes from (1024, 16384, 40000) for Res Net-50 and (768, 15360, 32448) for Vi T-B/16. |