reproducibilityindex.ai

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

Authors: Esteban Real, Chen Liang, David So, Quoc Le

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate this by introducing a novel framework that significantly reduces human bias through a generic search space. Despite the vastness of this space, evolutionary search can still discover two-layer neural networks trained by backpropagation. These simple neural networks can then be surpassed by evolving directly on tasks of interest, e.g. CIFAR10 variants, where modern techniques emerge in the top algorithms, such as bilinear interactions, normalized gradients, and weight averaging. For more realistic datasets, we use binary classification tasks extracted from CIFAR-10 and MNIST.
Researcher Affiliation	Industry	Esteban Real * 1 Chen Liang * 1 David R. So 1 Quoc V. Le 1 *Equal contribution 1Google Brain/Google Research, Mountain View, CA, USA.
Pseudocode	Yes	Figure 1: Algorithm evaluation on one task. We represent an algorithm as a program with three component functions (Setup, Predict, Learn). These are evaluated by the pseudo-code above, producing a mean loss for each task.
Open Source Code	Yes	1We open-source our code at https://github.com/google-research/google-research/tree/master/automl_zero
Open Datasets	Yes	Unless otherwise stated, we use binary classification tasks extracted from CIFAR-10, a collection of tiny images each labeled with object classes (Krizhevsky & Hinton, 2009)... To make sure the improvement is not specific to CIFAR-10, we further show the gain generalizes to other datasets: SVHN (Netzer et al., 2011), Image Net (Chrabaszcz et al., 2017), and Fashion MNIST (Xiao et al., 2017).
Dataset Splits	Yes	Once the search experiments are done, we select the best candidate by measuring their performances on another subset of tasks Tselect T (analogous to standard ML model selection with a validation set). ... For both datasets, the 45 pairs of the 10 classes yield tasks with 8000 train / 2000 valid examples.
Hardware Specification	No	The paper mentions running experiments on 'commodity CPU core' and in a distributed fashion across 'worker processes', but it does not specify exact CPU models, GPU models, or other detailed hardware specifications.
Software Dependencies	No	The paper provides a link to open-source code but does not explicitly list specific software dependencies with their version numbers in the text.
Experiment Setup	Yes	Experiment Details: we generate simple regression tasks with 1000 training and 100 validation examples with random 8dim. feature vectors {xi} and scalar labels {L(xi)}... Details: Generally, we use T=10, 100 <= P <= 1000. Each child algorithm is mutated with probability U=0.9. Run time: 5 days. ... For both datasets, the 45 pairs of the 10 classes yield tasks with 8000 train / 2000 valid examples. ... Features are projected to 8 <= F <= 256 dim. Each evaluation is on 1 <= D <= 10 tasks. W =10k. From now on, we use the full setup described in Section 3.2. In particular, we allow variable component function length. Number of possible ops: 7/ 58/ 58 for Setup/ Predict/ Learn, resp.