Understanding and Simplifying One-Shot Architecture Search
Authors: Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, Quoc Le
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | With careful experimental analysis, we show that it is possible to efficiently identify promising architectures from a complex search space without either hypernetworks or RL. and 4. One-Shot Model Experiments |
| Researcher Affiliation | Industry | 1Google Brain, Mountain View, CA. Correspondence to: Gabriel Bender <gbender@google.com>. |
| Pseudocode | No | The paper includes diagrams but no explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not mention any open-source code release or provide links to a code repository for the described methodology. |
| Open Datasets | Yes | On CIFAR-10 we used a 45,000 element training set, 5,000 element validation set, and 10,000 element test set. Image Net was partitioned into a 1,281,167 training set, 50,046 element validation set, and 50,000 element test set. |
| Dataset Splits | Yes | On CIFAR-10 we used a 45,000 element training set, 5,000 element validation set, and 10,000 element test set. Image Net was partitioned into a 1,281,167 training set, 50,046 element validation set, and 50,000 element test set. |
| Hardware Specification | Yes | Each one-shot model was trained for 5,000 10,000 steps (113 225 epochs) on a cluster of 16 P100 GPUs. and The One-Shot model was trained for 15k steps (about 47 epochs or 6 hours) with a batch size of 4,096 on four Cloud TPUs (16 chips). |
| Software Dependencies | No | Experiments were implemented using Tensor Flow (Abadi et al., 2016). However, no specific version number for TensorFlow or any other software dependency is provided. |
| Experiment Setup | Yes | Each worker used a batch size of 64, which was divided into two ghost batches of size 32. We used a global learning rate of 0.1 and Nesterov momentum 0.9. and At the start of training, dropout was effectively disabled, while at the end of training, we had a dropout rate determined by the coefficient r = 0.1. |