reproducibilityindex.ai

DrNAS: Dirichlet Neural Architecture Search

Authors: Xiangning Chen, Ruochen Wang, Minhao Cheng, Xiaocheng Tang, Cho-Jui Hsieh

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the effectiveness of our method. Speciﬁcally, we obtain a test error of 2.46% for CIFAR-10, 23.7% for Image Net under the mobile setting. On NASBench-201, we also achieve state-of-the-art results on all three datasets and provide insights for the effective design of neural architecture search algorithms.
Researcher Affiliation	Collaboration	1Department of Computer Science, UCLA, 2Di Di AI Labs {xiangning, chohsieh}@cs.ucla.edu {ruocwang, mhcheng}@ucla.edu xiaochengtang@didiglobal.com
Pseudocode	No	The paper describes its methods using mathematical formulations and textual descriptions but does not contain structured pseudocode or algorithm blocks (e.g., clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code	Yes	Our code is available at https://github.com/xiangning-chen/Dr NAS.
Open Datasets	Yes	We conduct extensive experiments on different datasets and search spaces to demonstrate Dr NAS s effectiveness. Based on the DARTS search space (Liu et al., 2019), we achieve an average error rate of 2.46% on CIFAR-10... On NAS-Bench-201 (Dong & Yang, 2020), we also set new state-of-the-art results on all three datasets... NAS-Bench-201 provides support for 3 dataset (CIFAR-10, CIFAR-100, Image Net-16-120 (Chrabaszcz et al., 2017)).
Dataset Splits	Yes	We equally divide the 50K training images into two parts, one is used for optimizing the network weights by momentum SGD and the other for learning the Dirichlet architecture distribution by an Adam optimizer.
Hardware Specification	Yes	In the ﬁrst stage, we set the partial channel parameter K as 6 to ﬁt the super-network into a single GTX 1080Ti GPU with 11GB memory, i.e., only 1/6 features are sampled on each edge.
Software Dependencies	No	The paper mentions optimizers (momentum SGD, Adam), learning rate schedulers (cosine annealing, warm-up), and regularization techniques (cutout, drop-path, label smoothing) but does not provide specific software names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup	Yes	Search Settings We equally divide the 50K training images into two parts, one is used for optimizing the network weights by momentum SGD and the other for learning the Dirichlet architecture distribution by an Adam optimizer. Since Dirichlet concentration β must be positive, we apply the shifted exponential linear mapping β = ELU(η) + 1 and optimize over η instead. We use l2 norm to constrain the distance between η and the anchor ˆη = 0. The η is initialized by standard Gaussian with scale 0.001, and λ in (2) is set to 0.001... Retrain Settings The evaluation phase uses the entire 50K training set to train the network from scratch for 600 epochs. The network weight is optimized by an SGD optimizer with a cosine annealing learning rate initialized as 0.025, a momentum of 0.9, and a weight decay of 3 10 4. To allow a fair comparison with previous work, we also employ cutout regularization with length 16, drop-path (Zoph et al., 2018) with probability 0.3 and an auxiliary tower of weight 0.4.