Understanding and Simplifying One-Shot Architecture Search

Authors: Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, Quoc Le

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental With careful experimental analysis, we show that it is possible to efficiently identify promising architectures from a complex search space without either hypernetworks or RL. and 4. One-Shot Model Experiments
Researcher Affiliation Industry 1Google Brain, Mountain View, CA. Correspondence to: Gabriel Bender <gbender@google.com>.
Pseudocode No The paper includes diagrams but no explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not mention any open-source code release or provide links to a code repository for the described methodology.
Open Datasets Yes On CIFAR-10 we used a 45,000 element training set, 5,000 element validation set, and 10,000 element test set. Image Net was partitioned into a 1,281,167 training set, 50,046 element validation set, and 50,000 element test set.
Dataset Splits Yes On CIFAR-10 we used a 45,000 element training set, 5,000 element validation set, and 10,000 element test set. Image Net was partitioned into a 1,281,167 training set, 50,046 element validation set, and 50,000 element test set.
Hardware Specification Yes Each one-shot model was trained for 5,000 10,000 steps (113 225 epochs) on a cluster of 16 P100 GPUs. and The One-Shot model was trained for 15k steps (about 47 epochs or 6 hours) with a batch size of 4,096 on four Cloud TPUs (16 chips).
Software Dependencies No Experiments were implemented using Tensor Flow (Abadi et al., 2016). However, no specific version number for TensorFlow or any other software dependency is provided.
Experiment Setup Yes Each worker used a batch size of 64, which was divided into two ghost batches of size 32. We used a global learning rate of 0.1 and Nesterov momentum 0.9. and At the start of training, dropout was effectively disabled, while at the end of training, we had a dropout rate determined by the coefficient r = 0.1.