Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution

Authors: Thomas Elsken, Jan Hendrik Metzen, Frank Hutter

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate LEMONADE for up to five objectives on two different search spaces for image classification: (i) non-modularized architectures and (ii) cells that are used as repeatable building blocks within an architecture (Zoph et al., 2018; Zhong et al., 2018) and also allow transfer to other data sets. LEMONADE returns a population of CNNs covering architectures with 10 000 to 10 000 000 parameters. Within only 5 days on 16 GPUs, LEMONADE discovers architectures that are competitive in terms of predictive performance and resource consumption with hand-designed networks, such as Mobile Net V2 (Sandler et al., 2018), as well as architectures that were automatically designed using 40x greater resources (Zoph et al., 2018) and other multi-objective methods (Dong et al., 2018).
Researcher Affiliation Collaboration Thomas Elsken Bosch Center for Artificial Intelligence and University of Freiburg Thomas.Elsken@de.bosch.com Jan Hendrik Metzen Bosch Center for Artificial Intelligence Jan Hendrik.Metzen@de.bosch.com Frank Hutter University of Freiburg fh@cs.uni-freiburg.de
Pseudocode Yes Algorithm 1 LEMONADE
Open Source Code No The paper does not provide an explicit statement or link regarding the availability of open-source code for the described methodology.
Open Datasets Yes We present results for LEMONADE on searching neural architectures for CIFAR-10. ... We also transfer the discovered cells from the last setting to Image Net (Section 5.4) and its down-scaled version Image Net64x64 (Chrabaszcz et al., 2017) (Section 5.3).
Dataset Splits Yes The training set is split up in a training (45.000) and a validation (5.000) set for the purpose of architecture search.
Hardware Specification Yes Within only 5 days on 16 GPUs, LEMONADE discovers architectures that are competitive in terms of predictive performance and resource consumption with hand-designed networks... In terms of inference time (bottom right), LEMONADE clearly finds models superior to the baselines. We highlight that this result has been achieved based on using only 80 GPU days for LEMONADE compared to 2000 in Zoph et al. (2018) and with a significantly more complex Search Space I... In detail, we measured the time for doing inference on a batch of 100 images on a Titan X GPU.
Software Dependencies No The paper mentions SGD, Batch Normalization, and cosine annealing as methods but does not specify software dependencies like programming language versions or library versions (e.g., PyTorch, TensorFlow, CUDA).
Experiment Setup Yes We apply the standard data augmentation scheme described by Loshchilov & Hutter (2017), as well as the recently proposed methods mixup (Zhang et al., 2017) and Cutout (Devries & Taylor, 2017). The training set is split up in a training (45.000) and a validation (5.000) set for the purpose of architecture search. We use weight decay (5 10 4) for all models. We use batch size 64 throughout all experiments. During architecture search as well as for generating the random search baseline, all models are trained for 20 epochs using SGD with cosine annealing (Loshchilov & Hutter, 2017), decaying the learning rate from 0.01 to 0. For evaluating the test performance, all models are trained from scratch on the training and validation set with the same setup as described above except for 1) we train for 600 epochs and 2) the initial learning rate is set to 0.025.