Breaking the Curse of Space Explosion: Towards Efficient NAS with Curriculum Search

Authors: Yong Guo, Yaofo Chen, Yin Zheng, Peilin Zhao, Jian Chen, Junzhou Huang, Mingkui Tan

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on CIFAR-10 and Image Net demonstrate the effectiveness of the proposed method. [...] Extensive experiments on several benchmark data sets show that the architectures found by our CNAS signif cantly outperform the architectures obtained by stateof-the-art NAS methods.
Researcher Affiliation Collaboration 1School of Software Engineer ing, South China University of Technology 2Pazhou Laboratory 3Weixin Group, Tencent 4Tencent AI Lab, Tencent 5Guangdong Key Laboratory of Big Data Analysis and Processing. Correspon dence to: Mingkui Tan <mingkuitan@scut.edu.cn>, Jian Chen <ellachen@scut.edu.cn>.
Pseudocode Yes Algorithm 1 Training method for CNAS.
Open Source Code Yes All the implementations are based on Py Torch.2 We organize the experiments as follows. (Footnote 2: The code is available at https://github.com/guoyongcs/CNAS.)
Open Datasets Yes We apply the proposed CNAS to train the controller model on CIFAR-10 (Krizhevsky & Hinton, 2009). Then, we evaluate the searched architectures on CIFAR-10 and Ima ge Net (Deng et al., 2009).
Dataset Splits Yes The goal of RL-based NAS is to find an optimal policy by maximizing the expectation of the reward R(α, w (α)). ... e.g., the accuracy on validation data. [...] We divide the offcial training set of CIFAR-10 into two parts, 40% for training the super network parameters and 60% for training the con troller parameters. [...] Following (Zoph & Le, 2017; Pham et al., 2018), we frst sample 10 architectures and then select the architecture with the highest validation accuracy.
Hardware Specification No The paper makes no mention of specific CPU or GPU models, memory, or other detailed hardware specifications used for experiments.
Software Dependencies No All the implementations are based on Py Torch. The paper mentions PyTorch but does not provide a specific version number or other software dependencies with version numbers.
Experiment Setup Yes Training details. ... We train the controller for 320 epochs in total, with 40 epochs for each stage. ... Evaluation details. The fnal convolution network is stacked with 20 learned cells: 18 normal cells and 2 re duction cells. ... we train the network for 600 epochs using the batch size of 96. We use an SGD op timizer with a weight decay of 3 10 4 and a momentum of 0.9. The learning rate starts from 0.025 and follows the cosine annealing strategy to a minimum of 0.001. ... Evaluation details (ImageNet). ... The network is trained for 250 epochs with a batch size of 256. We use an SGD optimizer with a weight decay of 3 10 5. The momentum term is set to 0.9. The learn ing rate is initialized to 0.1 and we gradually decrease it to zero.