iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients

Authors: Miao Zhang, Steven W. Su, Shirui Pan, Xiaojun Chang, Ehsan M Abbasnejad, Reza Haffari

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on two NAS benchmark search spaces and the common NAS search space verify the effectiveness of our proposed method. It leads to architectures outperforming, with large margins, those learned by the baseline methods.
Researcher Affiliation Academia 1Faculty of Information Technology, Monash University, Australia 2Faculty of Engineering and Information Technology, University of Technology Sydney, Australia 3Australian Institute for Machine Learning, University of Adelaide, Australia.
Pseudocode Yes Algorithm 1 i DARTS Input: Dtrain and Dval. Initialized supernet weights w and operations magnitude αθ. while not converged do 2: Sample batches from Dtrain. Update supernet weights w based on cross-entropy loss with T steps. Get the Hessian matrix 2L1 w w. 4: Sample batch from Dval. Calculate hypergradient α ˆLi 2(wj(α), α) based on Eq.(9), and update α with α α γα α ˆLi 2(wj(α), α). end while 6: Obtain α through argmax.
Open Source Code No The paper does not explicitly provide a link to its source code or state that it is available in supplementary materials.
Open Datasets Yes We consider three different cases to analyze i DARTS, including two NAS benchmark datasets, NAS-Bench-1Shot1 (Zela et al., 2020b) and NAS-Bench-201 (Dong & Yang, 2020), and the common DARTS search space (Liu et al., 2019). The NAS-Bench-1Shot1 is built from the NAS-Bench-101 benchmark dataset (Ying et al., 2019)... The search space of NASBench-201 is much simpler than NAS-Bench-1Shot1, while it contains the performance of CIFAR-100, CIFAR-100, and Image Net for all architectures in this search space.
Dataset Splits Yes Figure 1 plots the mean and standard deviation of the validation and test errors for i DARTS and DARTS, with tracking the performance during the architecture search on the NAS-Bench-1Shot1 dataset. The search procedure needs to look for two different types of cells on CIFAR-10... The best-found cell on CIFAR-10 is then transferred to CIFAR-100 and Image Net datasets to evaluate its transferability. All experiment settings are following DARTS for fair comparisons.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only mentioning general experimental settings.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We consider three different cases to analyze i DARTS, including two NAS benchmark datasets, NAS-Bench-1Shot1 (Zela et al., 2020b) and NAS-Bench-201 (Dong & Yang, 2020), and the common DARTS search space (Liu et al., 2019). We choose the third search space in NAS-Bench-1Shot1 to analyse i DARTS... We also analyze the effects of the inner optimization steps T, plotting the performance of i DARTS with different T on the NAS-Bench-1Shot1. Figure 2 (b) plots the performance of i DARTS with different learning rates γ for the inner optimization on the NAS-Bench-201. Figure 2 (c) also summaries the performance of i DARTS with different initial learning rate γα on the NAS-Bench-201. We also apply i DARTS to a convolutional architecture search in the common DARTS search space (Liu et al., 2019) to compare with the state-of-the-art NAS methods, where all experiment settings are following DARTS for fair comparisons. The search procedure needs to look for two different types of cells on CIFAR-10: normal cell αnormal and reduction cell αreduce, to stack more cells to form the final structure for the architecture evaluation. The evaluation setting for CIFAR-100 is the same as CIFAR-10. In the Image Net dataset, the experiment setting is slightly different from CIFAR-10 in that only 14 cells are stacked, and the number of initial channels is changed to 48.