Scalable Transfer Learning with Expert Models
Authors: Joan Puigcerver, Carlos Riquelme Ruiz, Basil Mustafa, Cedric Renggli, André Susano Pinto, Sylvain Gelly, Daniel Keysers, Neil Houlsby
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on two different data sources and demonstrate that it outperforms baselines on over 20 diverse vision tasks in both cases. and 6 EXPERIMENTAL RESULTS |
| Researcher Affiliation | Collaboration | Joan Puigcerver Google Research Carlos Riquelme Google Research Basil Mustafa Google Research Cedric Renggli ETH Zurich André Susano Pinto Google Research Sylvain Gelly Google Research Daniel Keysers Google Research Neil Houlsby Google Research |
| Pseudocode | No | The paper describes the algorithm steps in text and flow diagrams (Figure 1) but does not provide structured pseudocode or an algorithm block. |
| Open Source Code | No | We released 48 of these ImageNet21k models2. 2https://tfhub.dev/google/collections/experts/bit/1 This link is for pre-trained models, not explicitly for the source code of the described methodology. |
| Open Datasets | Yes | ImageNet21k (Deng et al., 2009) is a public dataset containing 13 million images, and 14 million labels of 21 843 classes, which are WordNet synsets (Fellbaum, 2012). and JFT (Sun et al., 2017) is an even larger dataset containing 300 million images and 18 291 classes. |
| Dataset Splits | Yes | In VTAB-1k we use the recommended hyperparameter sweep and 800-training/200-validation split. and In both sets of tasks, we use 1k training examples per dataset. |
| Hardware Specification | Yes | We pre-train generic models on a Cloud TPUv3-512, as done in (Kolesnikov et al., 2019). |
| Software Dependencies | No | For instance, ImageNet pre-training is popular since it is freely available and works well for many tasks (Donahue et al., 2014; Oquab et al., 2014; Sharif Razavian et al., 2014). In practice, this one-off down payment may not be made by the practitioner, since pre-trained networks are made available through platforms like PyTorch and TensorFlow Hub1. 1https://pytorch.org/hub/ and https://tfhub.dev/, respectively. The paper mentions software platforms but does not list specific version numbers for any libraries or dependencies. |
| Experiment Setup | No | Hyperparameter Selection. In VTAB-1k we use the recommended hyperparameter sweep and 800-training/200-validation split. and See appendices E.2 and F.1 for sweep details. The main text describes the process of hyperparameter selection, but defers the specific values to the appendices. |