SMASH: One-Shot Model Architecture Search through HyperNetworks
Authors: Andrew Brock, Theo Lim, J.M. Ritchie, Nick Weston
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our method (SMASH) on CIFAR-10 and CIFAR-100, STL-10, Model Net10, and Imagenet32x32, achieving competitive performance with similarly-sized handdesigned networks. |
| Researcher Affiliation | Collaboration | Andrew Brock, Theodore Lim, & J.M. Ritchie School of Engineering and Physical Sciences Heriot-Watt University Edinburgh, UK {ajb5, t.lim, j.m.ritchie}@hw.ac.uk Nick Weston Renishaw plc Research Ave, North Edinburgh, UK Nick.Weston@renishaw.com |
| Pseudocode | Yes | Algorithm 1 SMASH |
| Open Source Code | Yes | Our publicly available code1 is written in Py Torch (Paszke et al., 2017) to leverage dynamic graphs... 1https://github.com/ajbrock/SMASH |
| Open Datasets | Yes | We validate our one-Shot Model Architecture Search through Hyper Networks (SMASH) for Convolutional Neural Networks (CNN) on CIFAR-10 and CIFAR-100 (Krizhevsky and Hinton, 2009), Imagenet32x32 (Chrabaszcz et al., 2017), Model Net10 (Wu et al., 2015), and STL-10 (Coates et al., 2011) |
| Dataset Splits | Yes | First, we train a SMASH network for 300 epochs on CIFAR-100, using a standard annealing schedule (Huang et al., 2017), then sample 250 random architectures and evaluate their SMASH score on a held-out validation set formed of 5,000 random examples from the original training set. |
| Hardware Specification | No | The paper states: "For the CIFAR experiments, we train the SMASH network for 100 epochs and the resulting networks for 300 epochs, using a batch size of 50 on a single GPU." This only mentions "a single GPU" without specifying its model or any other detailed hardware specifications. |
| Software Dependencies | No | The paper mentions software like "Py Torch (Paszke et al., 2017)", "Adam (Kingma and Ba, 2014)", and "Nesterov Momentum". While it names the software and references foundational papers, it does not provide specific version numbers (e.g., PyTorch 1.9) for these dependencies, which is required for reproducible software specification. |
| Experiment Setup | Yes | Our Hyper Net is a 26 layer Dense Net, each layer of which comprises a Leaky Re LU activation followed by a 3x3 convolution with simpliļ¬ed Weight Norm and no biases. We do not use bottleneck blocks, dropout, or other normalizers in the Hyper Net... When training SMASH, we use Adam (Kingma and Ba, 2014) with the initial parameters proposed by (Radford et al., 2015) When training a resulting network, we use Nesterov Momentum with an initial step size of 0.1 and a momentum value of 0.9. For all tests other than the initial SMASHv1 experiments, we employ a cosine annealing schedule (Loshchilov and Hutter, 2017) without restarts (Gastaldi, 2017). For the CIFAR experiments, we train the SMASH network for 100 epochs and the resulting networks for 300 epochs, using a batch size of 50 on a single GPU. On Model Net10, we train for 100 epochs. On Image Net32x32, we train for 55 epochs. On STL-10, we train for 300 epochs when using the full training set, and 500 epochs when using the 10-fold training splits. |