AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
Authors: Esteban Real, Chen Liang, David So, Quoc Le
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate this by introducing a novel framework that significantly reduces human bias through a generic search space. Despite the vastness of this space, evolutionary search can still discover two-layer neural networks trained by backpropagation. These simple neural networks can then be surpassed by evolving directly on tasks of interest, e.g. CIFAR10 variants, where modern techniques emerge in the top algorithms, such as bilinear interactions, normalized gradients, and weight averaging. For more realistic datasets, we use binary classification tasks extracted from CIFAR-10 and MNIST. |
| Researcher Affiliation | Industry | Esteban Real * 1 Chen Liang * 1 David R. So 1 Quoc V. Le 1 *Equal contribution 1Google Brain/Google Research, Mountain View, CA, USA. |
| Pseudocode | Yes | Figure 1: Algorithm evaluation on one task. We represent an algorithm as a program with three component functions (Setup, Predict, Learn). These are evaluated by the pseudo-code above, producing a mean loss for each task. |
| Open Source Code | Yes | 1We open-source our code at https://github.com/google-research/google-research/tree/master/automl_zero |
| Open Datasets | Yes | Unless otherwise stated, we use binary classification tasks extracted from CIFAR-10, a collection of tiny images each labeled with object classes (Krizhevsky & Hinton, 2009)... To make sure the improvement is not specific to CIFAR-10, we further show the gain generalizes to other datasets: SVHN (Netzer et al., 2011), Image Net (Chrabaszcz et al., 2017), and Fashion MNIST (Xiao et al., 2017). |
| Dataset Splits | Yes | Once the search experiments are done, we select the best candidate by measuring their performances on another subset of tasks Tselect T (analogous to standard ML model selection with a validation set). ... For both datasets, the 45 pairs of the 10 classes yield tasks with 8000 train / 2000 valid examples. |
| Hardware Specification | No | The paper mentions running experiments on 'commodity CPU core' and in a distributed fashion across 'worker processes', but it does not specify exact CPU models, GPU models, or other detailed hardware specifications. |
| Software Dependencies | No | The paper provides a link to open-source code but does not explicitly list specific software dependencies with their version numbers in the text. |
| Experiment Setup | Yes | Experiment Details: we generate simple regression tasks with 1000 training and 100 validation examples with random 8dim. feature vectors {xi} and scalar labels {L(xi)}... Details: Generally, we use T=10, 100 <= P <= 1000. Each child algorithm is mutated with probability U=0.9. Run time: 5 days. ... For both datasets, the 45 pairs of the 10 classes yield tasks with 8000 train / 2000 valid examples. ... Features are projected to 8 <= F <= 256 dim. Each evaluation is on 1 <= D <= 10 tasks. W =10k. From now on, we use the full setup described in Section 3.2. In particular, we allow variable component function length. Number of possible ops: 7/ 58/ 58 for Setup/ Predict/ Learn, resp. |