ZARTS: On Zero-order Optimization for Neural Architecture Search

Authors: Xiaoxing Wang, Wenxuan Guo, Jianlin Su, Xiaokang Yang, Junchi Yan

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental However, our in-depth empirical results show that the approximation often distorts the loss landscape, leading to the biased objective to optimize and, in turn, inaccurate gradient estimation for architecture parameters. This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation. Specifically, three representative zero-order optimization methods are introduced: RS, MGS, and GLD, among which MGS performs best by balancing the accuracy and speed. Moreover, we explore the connections between RS/MGS and the gradient descent algorithm and show that our ZARTS can be seen as a robust gradient-free counterpart to DARTS. Extensive experiments on multiple datasets and search spaces show the remarkable performance of our method. In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
Researcher Affiliation Collaboration Xiaoxing Wang1, Wenxuan Guo1, Jianlin Su2, Xiaokang Yang1, Junchi Yan1 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Shenzhen Zhuiyi Technology Co., Ltd.
Pseudocode Yes Algorithm 1 ZARTS: Zero-order Optimization Framework for Architecture Search
Open Source Code Yes Source code will be made publicly available at: https://github.com/vic Figure/ZARTS. ... Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] The codes are included in the supplementary material.
Open Datasets Yes We first verify the stability of ZARTS (with three zero-order optimization methods RS, MGS, GLD) on the four popular search spaces of R-DARTS [38] on three datasets including CIFAR-10 [17], CIFAR-100 [17], and SVHN [24].
Dataset Splits Yes We train 250 epochs with a batch size of 1024 by SGD with a momentum of 0.9 and a base learning rate of 0.5. We utilize the same data pre-processing strategies and auxiliary classifiers as DARTS. ... (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Section 2.2 and 2.3 in supplementary material.
Hardware Specification Yes All the experiments are conducted on NVIDIA 2080Ti. ... (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] All the experiments are conducted on a single NVIDIA 2080Ti.
Software Dependencies No The paper mentions general software like 'SGD optimizer' but does not specify version numbers for any programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or specific software packages. While the checklist mentions 'Section 2.2 and 2.3 in supplementary material' for training details, the main paper does not provide the required versioned software dependencies.
Experiment Setup Yes We train 250 epochs with a batch size of 1024 by SGD with a momentum of 0.9 and a base learning rate of 0.5. We utilize the same data pre-processing strategies and auxiliary classifiers as DARTS. ... (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Section 2.2 and 2.3 in supplementary material.