Regularized Evolution for Image Classifier Architecture Search

Authors: Esteban Real, Alok Aggarwal, Yanping Huang, Quoc V. Le4780-4789

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here, we evolve an image classifier Amoeba Net-A that surpasses hand-designs for the first time. ... Scaled to larger size, Amoeba Net-A sets a new state-of-the-art 83.9% top-1 / 96.6% top-5 Image Net accuracy. In a controlled comparison against a well known reinforcement learning algorithm, we give evidence that evolution can obtain results faster with the same hardware, especially at the earlier stages of the search. This is relevant when fewer compute resources are available. Evolution is, thus, a simple method to effectively discover high-quality architectures. ... We ran controlled comparisons at scale, ensuring identical conditions for evolution, RL and random search (RS). In particular, all methods used the same computer code for network construction, training and evaluation. Experiments always searched on the CIFAR-10 dataset (Krizhevsky and Hinton 2009).
Researcher Affiliation Industry Esteban Real, Alok Aggarwal, Yanping Huang, Quoc V. Le Google Brain, Mountain View, California, USA Correspondence to E. Real at ereal@google.com
Pseudocode Yes Algorithm 1 Aging Evolution (i.e. Regularized Evolution) population empty queue The population. history Will contain all models. while |population| < P do Initialize population. model.arch RANDOMARCHITECTURE() model.accuracy TRAINANDEVAL(model.arch) add model to right of population add model to history end while while |history| < C do Evolve for C cycles. sample Parent candidates. while |sample| < S do candidate random element from population The element stays in the population. add candidate to sample end while parent highest-accuracy model in sample child.arch MUTATE(parent.arch) child.accuracy TRAINANDEVAL(child.arch) add child to right of population add child to history remove dead from left of population Oldest. discard dead end while return highest-accuracy model in history
Open Source Code Yes We open-sourced the code.2 https://colab.research.google.com/github/google-research/ google-research/blob/master/evolution/regularized_evolution_algorithm/regularized_evolution.ipynb ... We open-sourced code and checkpoint.3 https://tfhub.dev/google/imagenet/amoebanet_a_n18_f448/ classification/1
Open Datasets Yes Experiments always searched on the CIFAR-10 dataset (Krizhevsky and Hinton 2009). ... CIFAR-10 dataset (Krizhevsky and Hinton 2009) with 5k withheld examples for validation. Standard Image Net dataset (Deng et al. 2009), 1.2M 331x331 images and 1k classes; 50k examples withheld for validation; standard validation set used for testing.
Dataset Splits Yes CIFAR-10 dataset (Krizhevsky and Hinton 2009) with 5k withheld examples for validation. Standard Image Net dataset (Deng et al. 2009), 1.2M 331x331 images and 1k classes; 50k examples withheld for validation; standard validation set used for testing.
Hardware Specification Yes Each experiment ran on 450 K40 GPUs for 20k models (approx. 7 days). ... For Image Net table, N/F were 6/190 and 6/448 and standard training methods (Szegedy et al. 2017): distributed sync SGD with 100 P100 GPUs;
Software Dependencies No The paper mentions implementing methods and using optimizers (e.g., 'RL was implemented using the algorithm and code in the baseline study (Zoph et al. 2018).', 'SGD with momentum rate 0.9', 'RMSProp optimizer'), but it does not specify any software packages or libraries with their version numbers (e.g., 'Python 3.x', 'TensorFlow 2.x', 'PyTorch 1.x').
Experiment Setup Yes Evolved with P=100, S=25. ... During the search phase, each model trained for 25 epochs; N=3/F=24, 1 GPU. ... Best few (20) models were selected from each experiment and augmented to N=6/F=32, as in baseline study; batch 128, SGD with momentum rate 0.9, L2 weight decay 5e-4, initial lr 0.024 with cosine decay, 600 epochs, Scheduled Drop Path to 0.7 prob; auxiliary softmax with half-weight of main softmax. ... RMSProp optimizer with 0.9 decay and epsilon=0.1, 4e-5 weight decay, 0.1 label smoothing, auxiliary softmax weighted by 0.4; dropout probability 0.5; Scheduled Drop Path to 0.7 probability; 0.001 initial lr, decaying every 2 epochs by 0.97.