ZARTS: On Zero-order Optimization for Neural Architecture Search
Authors: Xiaoxing Wang, Wenxuan Guo, Jianlin Su, Xiaokang Yang, Junchi Yan
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | However, our in-depth empirical results show that the approximation often distorts the loss landscape, leading to the biased objective to optimize and, in turn, inaccurate gradient estimation for architecture parameters. This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation. Specifically, three representative zero-order optimization methods are introduced: RS, MGS, and GLD, among which MGS performs best by balancing the accuracy and speed. Moreover, we explore the connections between RS/MGS and the gradient descent algorithm and show that our ZARTS can be seen as a robust gradient-free counterpart to DARTS. Extensive experiments on multiple datasets and search spaces show the remarkable performance of our method. In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue. |
| Researcher Affiliation | Collaboration | Xiaoxing Wang1, Wenxuan Guo1, Jianlin Su2, Xiaokang Yang1, Junchi Yan1 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Shenzhen Zhuiyi Technology Co., Ltd. |
| Pseudocode | Yes | Algorithm 1 ZARTS: Zero-order Optimization Framework for Architecture Search |
| Open Source Code | Yes | Source code will be made publicly available at: https://github.com/vic Figure/ZARTS. ... Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] The codes are included in the supplementary material. |
| Open Datasets | Yes | We first verify the stability of ZARTS (with three zero-order optimization methods RS, MGS, GLD) on the four popular search spaces of R-DARTS [38] on three datasets including CIFAR-10 [17], CIFAR-100 [17], and SVHN [24]. |
| Dataset Splits | Yes | We train 250 epochs with a batch size of 1024 by SGD with a momentum of 0.9 and a base learning rate of 0.5. We utilize the same data pre-processing strategies and auxiliary classifiers as DARTS. ... (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Section 2.2 and 2.3 in supplementary material. |
| Hardware Specification | Yes | All the experiments are conducted on NVIDIA 2080Ti. ... (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] All the experiments are conducted on a single NVIDIA 2080Ti. |
| Software Dependencies | No | The paper mentions general software like 'SGD optimizer' but does not specify version numbers for any programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or specific software packages. While the checklist mentions 'Section 2.2 and 2.3 in supplementary material' for training details, the main paper does not provide the required versioned software dependencies. |
| Experiment Setup | Yes | We train 250 epochs with a batch size of 1024 by SGD with a momentum of 0.9 and a base learning rate of 0.5. We utilize the same data pre-processing strategies and auxiliary classifiers as DARTS. ... (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Section 2.2 and 2.3 in supplementary material. |