reproducibilityindex.ai

DARTS: Differentiable Architecture Search

Authors: Hanxiao Liu, Karen Simonyan, Yiming Yang

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on CIFAR-10, Image Net, Penn Treebank and Wiki Text-2 show that our algorithm excels in discovering high-performance convolutional architectures for image classiﬁcation and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques.
Researcher Affiliation	Collaboration	Hanxiao Liu CMU hanxiaol@cs.cmu.com Karen Simonyan Deep Mind simonyan@google.com Yiming Yang CMU yiming@cs.cmu.edu Current afﬁliation: Google Brain.
Pseudocode	Yes	Algorithm 1: DARTS Differentiable Architecture Search
Open Source Code	Yes	Our implementation has been made publicly available to facilitate further research on efﬁcient architecture search algorithms. The implementation of DARTS is available at https://github.com/quark0/darts
Open Datasets	Yes	Extensive experiments on CIFAR-10, Image Net, Penn Treebank and Wiki Text-2 show that our algorithm excels in discovering high-performance convolutional architectures for image classiﬁcation and recurrent architectures for language modeling...
Dataset Splits	Yes	To carry out architecture search, we hold out half of the CIFAR-10 training data as the validation set. We run DARTS four times with different random seeds and pick the best cell based on its validation performance obtained by training from scratch for a short period (100 epochs on CIFAR-10 and 300 epochs on PTB).
Hardware Specification	Yes	All of our experiments were performed using NVIDIA GTX 1080Ti GPUs. The training takes 3 days on a single 1080Ti GPU with our Py Torch implementation.
Software Dependencies	No	The paper mentions using 'PyTorch' but does not specify a version number for it or any other key software dependencies.
Experiment Setup	Yes	A small network of 8 cells is trained using DARTS for 50 epochs, with batch size 64 (for both the training and validation sets) and the initial number of channels 16. We use momentum SGD to optimize the weights w, with initial learning rate ηw = 0.025 (annealed down to zero following a cosine schedule without restart (Loshchilov & Hutter, 2016)), momentum 0.9, and weight decay 3 10 4. We use zero initialization for architecture variables (the α s in both the normal and reduction cells)... We use Adam (Kingma & Ba, 2014) as the optimizer for α, with initial learning rate ηα = 3 10 4, momentum β = (0.5, 0.999) and weight decay 10 3.