reproducibilityindex.ai

Scalable Transfer Learning with Expert Models

Authors: Joan Puigcerver, Carlos Riquelme Ruiz, Basil Mustafa, Cedric Renggli, André Susano Pinto, Sylvain Gelly, Daniel Keysers, Neil Houlsby

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on two different data sources and demonstrate that it outperforms baselines on over 20 diverse vision tasks in both cases. and 6 EXPERIMENTAL RESULTS
Researcher Affiliation	Collaboration	Joan Puigcerver Google Research Carlos Riquelme Google Research Basil Mustafa Google Research Cedric Renggli ETH Zurich André Susano Pinto Google Research Sylvain Gelly Google Research Daniel Keysers Google Research Neil Houlsby Google Research
Pseudocode	No	The paper describes the algorithm steps in text and flow diagrams (Figure 1) but does not provide structured pseudocode or an algorithm block.
Open Source Code	No	We released 48 of these ImageNet21k models2. 2https://tfhub.dev/google/collections/experts/bit/1 This link is for pre-trained models, not explicitly for the source code of the described methodology.
Open Datasets	Yes	ImageNet21k (Deng et al., 2009) is a public dataset containing 13 million images, and 14 million labels of 21 843 classes, which are WordNet synsets (Fellbaum, 2012). and JFT (Sun et al., 2017) is an even larger dataset containing 300 million images and 18 291 classes.
Dataset Splits	Yes	In VTAB-1k we use the recommended hyperparameter sweep and 800-training/200-validation split. and In both sets of tasks, we use 1k training examples per dataset.
Hardware Specification	Yes	We pre-train generic models on a Cloud TPUv3-512, as done in (Kolesnikov et al., 2019).
Software Dependencies	No	For instance, ImageNet pre-training is popular since it is freely available and works well for many tasks (Donahue et al., 2014; Oquab et al., 2014; Sharif Razavian et al., 2014). In practice, this one-off down payment may not be made by the practitioner, since pre-trained networks are made available through platforms like PyTorch and TensorFlow Hub1. 1https://pytorch.org/hub/ and https://tfhub.dev/, respectively. The paper mentions software platforms but does not list specific version numbers for any libraries or dependencies.
Experiment Setup	No	Hyperparameter Selection. In VTAB-1k we use the recommended hyperparameter sweep and 800-training/200-validation split. and See appendices E.2 and F.1 for sweep details. The main text describes the process of hyperparameter selection, but defers the specific values to the appendices.