reproducibilityindex.ai

Understanding and Simplifying One-Shot Architecture Search

Authors: Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, Quoc Le

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	With careful experimental analysis, we show that it is possible to efﬁciently identify promising architectures from a complex search space without either hypernetworks or RL. and 4. One-Shot Model Experiments
Researcher Affiliation	Industry	1Google Brain, Mountain View, CA. Correspondence to: Gabriel Bender <gbender@google.com>.
Pseudocode	No	The paper includes diagrams but no explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not mention any open-source code release or provide links to a code repository for the described methodology.
Open Datasets	Yes	On CIFAR-10 we used a 45,000 element training set, 5,000 element validation set, and 10,000 element test set. Image Net was partitioned into a 1,281,167 training set, 50,046 element validation set, and 50,000 element test set.
Dataset Splits	Yes	On CIFAR-10 we used a 45,000 element training set, 5,000 element validation set, and 10,000 element test set. Image Net was partitioned into a 1,281,167 training set, 50,046 element validation set, and 50,000 element test set.
Hardware Specification	Yes	Each one-shot model was trained for 5,000 10,000 steps (113 225 epochs) on a cluster of 16 P100 GPUs. and The One-Shot model was trained for 15k steps (about 47 epochs or 6 hours) with a batch size of 4,096 on four Cloud TPUs (16 chips).
Software Dependencies	No	Experiments were implemented using Tensor Flow (Abadi et al., 2016). However, no specific version number for TensorFlow or any other software dependency is provided.
Experiment Setup	Yes	Each worker used a batch size of 64, which was divided into two ghost batches of size 32. We used a global learning rate of 0.1 and Nesterov momentum 0.9. and At the start of training, dropout was effectively disabled, while at the end of training, we had a dropout rate determined by the coefﬁcient r = 0.1.