Stabilizing Differentiable Architecture Search via Perturbation-based Regularization
Authors: Xiangning Chen, Cho-Jui Hsieh
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We achieve performance gain across various search spaces on 4 datasets. The search trajectory on NASBench-1Shot1 demonstrates the effectiveness of our approach. The proposed methods consistently improve DARTS-based methods and can match or improve state-of-the-art results on various search spaces of CIFAR-10, Image Net, and Penn Treebank. |
| Researcher Affiliation | Academia | 1Department of Computer Science, UCLA. Correspondence to: Xiangning Chen <xiangning@cs.ucla.edu>. |
| Pseudocode | Yes | Algorithm 1 Training of SDARTS |
| Open Source Code | Yes | Our code is available at https://github. com/xiangning-chen/Smooth DARTS. |
| Open Datasets | Yes | NAS-Bench-1Shot1 is a benchmark architecture dataset (Zela et al., 2020b) covering 3 search spaces based on CIFAR-10. It provides a mapping between the continuous space of differentiable NAS and discrete space in NAS-Bench-101 (Ying et al., 2019) the first architecture dataset proposed to lower the entry barrier of NAS. |
| Dataset Splits | Yes | NAS-Bench-1Shot1 is a benchmark architecture dataset (Zela et al., 2020b) covering 3 search spaces based on CIFAR-10. Table 3: Comparison with state-of-the-art language models on PTB (lower perplexity is better). Architecture Perplexity(%) Params (M) valid test |
| Hardware Specification | Yes | To search for 100 epochs on a single NVIDIA GTX 1080 Ti GPU, ENAS (Pham et al., 2018), DARTS (Liu et al., 2019), GDAS (Dong & Yang, 2019), NASP (Yao et al., 2020b), and PC-DARTS (Xu et al., 2020) require 10.5h, 8h, 4.5h, 5h, and 6h respectively. |
| Software Dependencies | No | The paper mentions optimizers like SGD and refers to frameworks implicitly through cited works (e.g., DARTS which is often implemented in PyTorch), but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We train the network for 250 epochs by an SGD optimizer with an annealing learning rate initialized as 0.5, a momentum of 0.9, and a weight decay of 3 10 5. When searching, we train the RNN network for 50 epochs with sequence length as 35. During evaluation, the final architecture is trained by an SGD optimizer, where the batch size is set as 64 and the learning rate is fixed as 20. |