Neural Architecture Search in A Proxy Validation Loss Landscape
Authors: Yanxi Li, Minjing Dong, Yunhe Wang, Chang Xu
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on benchmarks demonstrate that the architecture searched by the proposed algorithm can achieve a satisfactory accuracy with less time cost. Experimental results on benchmarks demonstrate that the architecture searched by the proposed algorithm can achieve a satisfactory accuracy with less time cost. |
| Researcher Affiliation | Collaboration | 1School of Computer Science, University of Sydney 2Noah s Ark Lab, Huawei Technoligies. |
| Pseudocode | Yes | Algorithm 1 Loss Space Regression |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the methodology described. |
| Open Datasets | Yes | Following previous works (Liu et al., 2018b; Dong & Yang, 2019), we use the CIFAR-10 dataset (Krizhevsky et al., 2009) for architecture searching and results evaluation. The CIFAR-10 dataset contains 50,000 training images together with 10,000 testing images from 10 classes. The generality of architecture we obtained is tested on Image Net 2012 (Russakovsky et al., 2015). |
| Dataset Splits | Yes | During the searching phase, we shuffle the training set and divide it into two parts with equal size for model weights training and validation performance inference respectively. |
| Hardware Specification | No | The paper mentions "GPU days" but does not specify any particular GPU models, CPU models, or other hardware specifications used for the experiments. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for reproducibility. |
| Experiment Setup | Yes | We set candidates number K = 8, including 4 convolutional operations... The super-network for searching is constructed by stacking 8 cells... The network has 16 initial channels... The warm-up population is initialized with 100 random sampled architectures... We trained models in the warm-up population with minibatch gradient descent, whose batch size is set to 64 and the base learning rate is set to 0.025... The architecture weights and validation loss estimator are both optimized by Adam with a constant learning rate of 0.1. The Softmax temperature τ in Gumbel-Softmax is set to 0.1. To evaluate the performance of the obtained architecture, a larger network is constructed with 20 stacked cells and 36 initial channels. The network is trained with the same training setting as in the searching phase for 600 epochs on the complete CIFAR-10 training set. |