DARTS: Differentiable Architecture Search
Authors: Hanxiao Liu, Karen Simonyan, Yiming Yang
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on CIFAR-10, Image Net, Penn Treebank and Wiki Text-2 show that our algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques. |
| Researcher Affiliation | Collaboration | Hanxiao Liu CMU hanxiaol@cs.cmu.com Karen Simonyan Deep Mind simonyan@google.com Yiming Yang CMU yiming@cs.cmu.edu Current affiliation: Google Brain. |
| Pseudocode | Yes | Algorithm 1: DARTS Differentiable Architecture Search |
| Open Source Code | Yes | Our implementation has been made publicly available to facilitate further research on efficient architecture search algorithms. The implementation of DARTS is available at https://github.com/quark0/darts |
| Open Datasets | Yes | Extensive experiments on CIFAR-10, Image Net, Penn Treebank and Wiki Text-2 show that our algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling... |
| Dataset Splits | Yes | To carry out architecture search, we hold out half of the CIFAR-10 training data as the validation set. We run DARTS four times with different random seeds and pick the best cell based on its validation performance obtained by training from scratch for a short period (100 epochs on CIFAR-10 and 300 epochs on PTB). |
| Hardware Specification | Yes | All of our experiments were performed using NVIDIA GTX 1080Ti GPUs. The training takes 3 days on a single 1080Ti GPU with our Py Torch implementation. |
| Software Dependencies | No | The paper mentions using 'PyTorch' but does not specify a version number for it or any other key software dependencies. |
| Experiment Setup | Yes | A small network of 8 cells is trained using DARTS for 50 epochs, with batch size 64 (for both the training and validation sets) and the initial number of channels 16. We use momentum SGD to optimize the weights w, with initial learning rate ηw = 0.025 (annealed down to zero following a cosine schedule without restart (Loshchilov & Hutter, 2016)), momentum 0.9, and weight decay 3 10 4. We use zero initialization for architecture variables (the α s in both the normal and reduction cells)... We use Adam (Kingma & Ba, 2014) as the optimizer for α, with initial learning rate ηα = 3 10 4, momentum β = (0.5, 0.999) and weight decay 10 3. |