reproducibilityindex.ai

SMASH: One-Shot Model Architecture Search through HyperNetworks

Authors: Andrew Brock, Theo Lim, J.M. Ritchie, Nick Weston

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our method (SMASH) on CIFAR-10 and CIFAR-100, STL-10, Model Net10, and Imagenet32x32, achieving competitive performance with similarly-sized handdesigned networks.
Researcher Affiliation	Collaboration	Andrew Brock, Theodore Lim, & J.M. Ritchie School of Engineering and Physical Sciences Heriot-Watt University Edinburgh, UK {ajb5, t.lim, j.m.ritchie}@hw.ac.uk Nick Weston Renishaw plc Research Ave, North Edinburgh, UK Nick.Weston@renishaw.com
Pseudocode	Yes	Algorithm 1 SMASH
Open Source Code	Yes	Our publicly available code1 is written in Py Torch (Paszke et al., 2017) to leverage dynamic graphs... 1https://github.com/ajbrock/SMASH
Open Datasets	Yes	We validate our one-Shot Model Architecture Search through Hyper Networks (SMASH) for Convolutional Neural Networks (CNN) on CIFAR-10 and CIFAR-100 (Krizhevsky and Hinton, 2009), Imagenet32x32 (Chrabaszcz et al., 2017), Model Net10 (Wu et al., 2015), and STL-10 (Coates et al., 2011)
Dataset Splits	Yes	First, we train a SMASH network for 300 epochs on CIFAR-100, using a standard annealing schedule (Huang et al., 2017), then sample 250 random architectures and evaluate their SMASH score on a held-out validation set formed of 5,000 random examples from the original training set.
Hardware Specification	No	The paper states: "For the CIFAR experiments, we train the SMASH network for 100 epochs and the resulting networks for 300 epochs, using a batch size of 50 on a single GPU." This only mentions "a single GPU" without specifying its model or any other detailed hardware specifications.
Software Dependencies	No	The paper mentions software like "Py Torch (Paszke et al., 2017)", "Adam (Kingma and Ba, 2014)", and "Nesterov Momentum". While it names the software and references foundational papers, it does not provide specific version numbers (e.g., PyTorch 1.9) for these dependencies, which is required for reproducible software specification.
Experiment Setup	Yes	Our Hyper Net is a 26 layer Dense Net, each layer of which comprises a Leaky Re LU activation followed by a 3x3 convolution with simpliﬁed Weight Norm and no biases. We do not use bottleneck blocks, dropout, or other normalizers in the Hyper Net... When training SMASH, we use Adam (Kingma and Ba, 2014) with the initial parameters proposed by (Radford et al., 2015) When training a resulting network, we use Nesterov Momentum with an initial step size of 0.1 and a momentum value of 0.9. For all tests other than the initial SMASHv1 experiments, we employ a cosine annealing schedule (Loshchilov and Hutter, 2017) without restarts (Gastaldi, 2017). For the CIFAR experiments, we train the SMASH network for 100 epochs and the resulting networks for 300 epochs, using a batch size of 50 on a single GPU. On Model Net10, we train for 100 epochs. On Image Net32x32, we train for 55 epochs. On STL-10, we train for 300 epochs when using the full training set, and 500 epochs when using the 10-fold training splits.