reproducibilityindex.ai

Stabilizing Differentiable Architecture Search via Perturbation-based Regularization

Authors: Xiangning Chen, Cho-Jui Hsieh

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We achieve performance gain across various search spaces on 4 datasets. The search trajectory on NASBench-1Shot1 demonstrates the effectiveness of our approach. The proposed methods consistently improve DARTS-based methods and can match or improve state-of-the-art results on various search spaces of CIFAR-10, Image Net, and Penn Treebank.
Researcher Affiliation	Academia	1Department of Computer Science, UCLA. Correspondence to: Xiangning Chen <xiangning@cs.ucla.edu>.
Pseudocode	Yes	Algorithm 1 Training of SDARTS
Open Source Code	Yes	Our code is available at https://github. com/xiangning-chen/Smooth DARTS.
Open Datasets	Yes	NAS-Bench-1Shot1 is a benchmark architecture dataset (Zela et al., 2020b) covering 3 search spaces based on CIFAR-10. It provides a mapping between the continuous space of differentiable NAS and discrete space in NAS-Bench-101 (Ying et al., 2019) the ﬁrst architecture dataset proposed to lower the entry barrier of NAS.
Dataset Splits	Yes	NAS-Bench-1Shot1 is a benchmark architecture dataset (Zela et al., 2020b) covering 3 search spaces based on CIFAR-10. Table 3: Comparison with state-of-the-art language models on PTB (lower perplexity is better). Architecture Perplexity(%) Params (M) valid test
Hardware Specification	Yes	To search for 100 epochs on a single NVIDIA GTX 1080 Ti GPU, ENAS (Pham et al., 2018), DARTS (Liu et al., 2019), GDAS (Dong & Yang, 2019), NASP (Yao et al., 2020b), and PC-DARTS (Xu et al., 2020) require 10.5h, 8h, 4.5h, 5h, and 6h respectively.
Software Dependencies	No	The paper mentions optimizers like SGD and refers to frameworks implicitly through cited works (e.g., DARTS which is often implemented in PyTorch), but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We train the network for 250 epochs by an SGD optimizer with an annealing learning rate initialized as 0.5, a momentum of 0.9, and a weight decay of 3 10 5. When searching, we train the RNN network for 50 epochs with sequence length as 35. During evaluation, the ﬁnal architecture is trained by an SGD optimizer, where the batch size is set as 64 and the learning rate is ﬁxed as 20.