Scalable Transfer Learning with Expert Models

Authors: Joan Puigcerver, Carlos Riquelme Ruiz, Basil Mustafa, Cedric Renggli, André Susano Pinto, Sylvain Gelly, Daniel Keysers, Neil Houlsby

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on two different data sources and demonstrate that it outperforms baselines on over 20 diverse vision tasks in both cases. and 6 EXPERIMENTAL RESULTS
Researcher Affiliation Collaboration Joan Puigcerver Google Research Carlos Riquelme Google Research Basil Mustafa Google Research Cedric Renggli ETH Zurich André Susano Pinto Google Research Sylvain Gelly Google Research Daniel Keysers Google Research Neil Houlsby Google Research
Pseudocode No The paper describes the algorithm steps in text and flow diagrams (Figure 1) but does not provide structured pseudocode or an algorithm block.
Open Source Code No We released 48 of these ImageNet21k models2. 2https://tfhub.dev/google/collections/experts/bit/1 This link is for pre-trained models, not explicitly for the source code of the described methodology.
Open Datasets Yes ImageNet21k (Deng et al., 2009) is a public dataset containing 13 million images, and 14 million labels of 21 843 classes, which are WordNet synsets (Fellbaum, 2012). and JFT (Sun et al., 2017) is an even larger dataset containing 300 million images and 18 291 classes.
Dataset Splits Yes In VTAB-1k we use the recommended hyperparameter sweep and 800-training/200-validation split. and In both sets of tasks, we use 1k training examples per dataset.
Hardware Specification Yes We pre-train generic models on a Cloud TPUv3-512, as done in (Kolesnikov et al., 2019).
Software Dependencies No For instance, ImageNet pre-training is popular since it is freely available and works well for many tasks (Donahue et al., 2014; Oquab et al., 2014; Sharif Razavian et al., 2014). In practice, this one-off down payment may not be made by the practitioner, since pre-trained networks are made available through platforms like PyTorch and TensorFlow Hub1. 1https://pytorch.org/hub/ and https://tfhub.dev/, respectively. The paper mentions software platforms but does not list specific version numbers for any libraries or dependencies.
Experiment Setup No Hyperparameter Selection. In VTAB-1k we use the recommended hyperparameter sweep and 800-training/200-validation split. and See appendices E.2 and F.1 for sweep details. The main text describes the process of hyperparameter selection, but defers the specific values to the appendices.