iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients
Authors: Miao Zhang, Steven W. Su, Shirui Pan, Xiaojun Chang, Ehsan M Abbasnejad, Reza Haffari
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on two NAS benchmark search spaces and the common NAS search space verify the effectiveness of our proposed method. It leads to architectures outperforming, with large margins, those learned by the baseline methods. |
| Researcher Affiliation | Academia | 1Faculty of Information Technology, Monash University, Australia 2Faculty of Engineering and Information Technology, University of Technology Sydney, Australia 3Australian Institute for Machine Learning, University of Adelaide, Australia. |
| Pseudocode | Yes | Algorithm 1 i DARTS Input: Dtrain and Dval. Initialized supernet weights w and operations magnitude αθ. while not converged do 2: Sample batches from Dtrain. Update supernet weights w based on cross-entropy loss with T steps. Get the Hessian matrix 2L1 w w. 4: Sample batch from Dval. Calculate hypergradient α ˆLi 2(wj(α), α) based on Eq.(9), and update α with α α γα α ˆLi 2(wj(α), α). end while 6: Obtain α through argmax. |
| Open Source Code | No | The paper does not explicitly provide a link to its source code or state that it is available in supplementary materials. |
| Open Datasets | Yes | We consider three different cases to analyze i DARTS, including two NAS benchmark datasets, NAS-Bench-1Shot1 (Zela et al., 2020b) and NAS-Bench-201 (Dong & Yang, 2020), and the common DARTS search space (Liu et al., 2019). The NAS-Bench-1Shot1 is built from the NAS-Bench-101 benchmark dataset (Ying et al., 2019)... The search space of NASBench-201 is much simpler than NAS-Bench-1Shot1, while it contains the performance of CIFAR-100, CIFAR-100, and Image Net for all architectures in this search space. |
| Dataset Splits | Yes | Figure 1 plots the mean and standard deviation of the validation and test errors for i DARTS and DARTS, with tracking the performance during the architecture search on the NAS-Bench-1Shot1 dataset. The search procedure needs to look for two different types of cells on CIFAR-10... The best-found cell on CIFAR-10 is then transferred to CIFAR-100 and Image Net datasets to evaluate its transferability. All experiment settings are following DARTS for fair comparisons. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only mentioning general experimental settings. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We consider three different cases to analyze i DARTS, including two NAS benchmark datasets, NAS-Bench-1Shot1 (Zela et al., 2020b) and NAS-Bench-201 (Dong & Yang, 2020), and the common DARTS search space (Liu et al., 2019). We choose the third search space in NAS-Bench-1Shot1 to analyse i DARTS... We also analyze the effects of the inner optimization steps T, plotting the performance of i DARTS with different T on the NAS-Bench-1Shot1. Figure 2 (b) plots the performance of i DARTS with different learning rates γ for the inner optimization on the NAS-Bench-201. Figure 2 (c) also summaries the performance of i DARTS with different initial learning rate γα on the NAS-Bench-201. We also apply i DARTS to a convolutional architecture search in the common DARTS search space (Liu et al., 2019) to compare with the state-of-the-art NAS methods, where all experiment settings are following DARTS for fair comparisons. The search procedure needs to look for two different types of cells on CIFAR-10: normal cell αnormal and reduction cell αreduce, to stack more cells to form the final structure for the architecture evaluation. The evaluation setting for CIFAR-100 is the same as CIFAR-10. In the Image Net dataset, the experiment setting is slightly different from CIFAR-10 in that only 14 cells are stacked, and the number of initial channels is changed to 48. |