reproducibilityindex.ai

ZARTS: On Zero-order Optimization for Neural Architecture Search

Authors: Xiaoxing Wang, Wenxuan Guo, Jianlin Su, Xiaokang Yang, Junchi Yan

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	However, our in-depth empirical results show that the approximation often distorts the loss landscape, leading to the biased objective to optimize and, in turn, inaccurate gradient estimation for architecture parameters. This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation. Specifically, three representative zero-order optimization methods are introduced: RS, MGS, and GLD, among which MGS performs best by balancing the accuracy and speed. Moreover, we explore the connections between RS/MGS and the gradient descent algorithm and show that our ZARTS can be seen as a robust gradient-free counterpart to DARTS. Extensive experiments on multiple datasets and search spaces show the remarkable performance of our method. In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
Researcher Affiliation	Collaboration	Xiaoxing Wang1, Wenxuan Guo1, Jianlin Su2, Xiaokang Yang1, Junchi Yan1 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Shenzhen Zhuiyi Technology Co., Ltd.
Pseudocode	Yes	Algorithm 1 ZARTS: Zero-order Optimization Framework for Architecture Search
Open Source Code	Yes	Source code will be made publicly available at: https://github.com/vic Figure/ZARTS. ... Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] The codes are included in the supplementary material.
Open Datasets	Yes	We first verify the stability of ZARTS (with three zero-order optimization methods RS, MGS, GLD) on the four popular search spaces of R-DARTS [38] on three datasets including CIFAR-10 [17], CIFAR-100 [17], and SVHN [24].
Dataset Splits	Yes	We train 250 epochs with a batch size of 1024 by SGD with a momentum of 0.9 and a base learning rate of 0.5. We utilize the same data pre-processing strategies and auxiliary classifiers as DARTS. ... (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Section 2.2 and 2.3 in supplementary material.
Hardware Specification	Yes	All the experiments are conducted on NVIDIA 2080Ti. ... (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] All the experiments are conducted on a single NVIDIA 2080Ti.
Software Dependencies	No	The paper mentions general software like 'SGD optimizer' but does not specify version numbers for any programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or specific software packages. While the checklist mentions 'Section 2.2 and 2.3 in supplementary material' for training details, the main paper does not provide the required versioned software dependencies.
Experiment Setup	Yes	We train 250 epochs with a batch size of 1024 by SGD with a momentum of 0.9 and a base learning rate of 0.5. We utilize the same data pre-processing strategies and auxiliary classifiers as DARTS. ... (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Section 2.2 and 2.3 in supplementary material.