reproducibilityindex.ai

Zero-shot AutoML with Pretrained Models

Authors: Ekrem Öztürk, Fabio Ferreira, Hadi Jomaa, Lars Schmidt-Thieme, Josif Grabocka, Frank Hutter

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach under the strict time limit of the vision track of the Cha Learn Auto DL challenge benchmark, clearly outperforming all challenge contenders. 5. Experiments
Researcher Affiliation	Collaboration	1University of Freiburg 2University of Hildesheim 3Bosch Center for Artificial Intelligence.
Pseudocode	No	The paper describes methods using text and mathematical equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	To foster reproducibility, we make our Py Torch (Paszke et al., 2019) code, models, and data publicly available under this URL.
Open Datasets	Yes	With this preference in mind, we retrieved 35 core datasets provided by the Tensor Flow (Abadi et al., 2015) Datasets (TFDS) utility library (Google, 2021) and applied a dataset augmentation process (Stoll, 2020)... Table 7. Domains of the original datasets Objects Cifar100 (Krizhevsky et al., 2009), Cifar10, Horses or Humans (Moroney, 2019a), Cycle GAN Horse2zebra (Zhu et al., 2017), Cycle GAN Facades, Cycle GAN Apple2orange, Imagenette (Howard), Coil100 (Nene et al., 1996), Stanford Dogs (Khosla et al., 2011), Rock, Paper and Scissors (Moroney, 2019b), TF Flowers (Team, 2019), Cassava (Mwebaze et al., 2019), Fashion MNIST (Xiao et al., 2017), Cars196 (Krause et al., 2013), Cats vs Dogs (Elson et al., 2007), Image Net Resized 32x32 (Chrabaszcz et al., 2017) Characters Cmaterdb Devanagari (Das et al., 2012a;b), Cmaterdb Bangla, MNIST(Le Cun et al., 2010), KMNIST (Clanuwat et al., 2018), EMNIST Byclass (Cohen et al., 2017), EMNIST MNIST, Cmaterdb Telugu, EMNIST Balanced, Omniglot (Lake et al., 2015), SVHN Cropped (Netzer et al., 2011) Medical Colorectal Histology (Kather et al., 2016), Malaria (Rajaraman et al., 2018) Aerial Uc Merced (Yang & Newsam, 2010), Cycle GAN Maps, Eurosat RGB (Helber et al., 2017) Drawings/Pictures Cycle GAN Vangogh2photo, Cycle GAN Ukiyoe2photo
Dataset Splits	Yes	Let X := {xn}N n=1 denote a set of N distinct deep learning (DL) pipelines. Every DL pipeline xn := (Mn, θn) comprises a pre-trained model Mn M and fine-tuning hyperparameters θn Θ that are used to fine-tune Mn to a given dataset. Furthermore, let D = {Di}I i=1 denote a collection of I datasets, where each dataset Di D is split into disjoint training, validation and testing subsets Di := D(tr) i D(val) i D(test) i . We further apply 5-fold inner cross-validation to optimally identify the best stopping epoch while monitoring validation performance.
Hardware Specification	Yes	The specification of our machines is the following: AMD EPYC 7502 32-Core Processor, NVIDIA Ge Force RTX 2080 Ti, 500GB RAM, CUDA version 11.5, Ubuntu 20.04.3 LTS.
Software Dependencies	No	The paper mentions key software components such as Py Torch (Paszke et al., 2019) and Tensor Flow (Abadi et al., 2015) but provides their version numbers only implicitly via publication year of their respective papers rather than explicit software version numbers (e.g., PyTorch 1.x). It does specify 'CUDA version 11.5' and 'Ubuntu 20.04.3 LTS', which are system software.
Experiment Setup	Yes	Overall, our DL pipeline space X is comprised of 26 hyperparameters of the types real and integer-valued, categorical, and conditional. A condensed version is presented in Table 6. Specifically, we used the hyperparameter optimization method BOHB (Falkner et al., 2018), which supports high-dimensional and categorical hyperparameter spaces, to find a (near)-optimal instantiation of our DL pipeline space for each dataset. We optimized the anytime Area under the Learning Curve (ALC) score (introduced in the Auto DL challenge (Liu et al., 2021) and described in more detail in Section 5.1) via BOHB, with a budget of five minutes for evaluating one DL pipeline on one dataset.