ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding

Authors: Yibo Yang, Hongyang Li, Shan You, Fei Wang, Chen Qian, Zhouchen Lin

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments, We analyze the improved efficiency and correlation of the two-stage and one-stage ISTA-NAS, and then compare our search results on CIFAR-10 and Image Net with state-of-the-art methods.
Researcher Affiliation Collaboration Yibo Yang1,2, Hongyang Li2, Shan You3, Fei Wang3, Chen Qian3, Zhouchen Lin2, 1Center for Data Science, Academy for Advanced Interdisciplinary Studies, Peking University 2Key Laboratory of Machine Perception (MOE), School of EECS, Peking University 3Sense Time
Pseudocode Yes Algorithm 1 Two-stage ISTA-NAS (for search only) and Algorithm 2 One-stage ISTA-NAS (for both search and evaluation)
Open Source Code Yes 1Code address: https://github.com/iboing/ISTA-NAS.
Open Datasets Yes Our two-stage method on CIFAR-10 requires only 0.05 GPU-day for search. Our one-stage method produces state-of-the-art performances on both CIFAR-10 and Image Net at the cost of only evaluation time.
Dataset Splits No The paper mentions a validation set (e.g., 'Lval' and 'super-net accuracies of search... on validation' in Figure 2) and that for one-stage ISTA-NAS 'no validation set is split out', implying one for two-stage, but it does not provide specific split percentages or sample counts for training, validation, and test sets in the main text.
Hardware Specification Yes Search cost is tested on a GTX 1080Ti GPU. and Cost is tested on eight GTX 1080Ti GPUs.
Software Dependencies No The paper states 'We use the released tool MOSEK with CVX [17] to efficiently solve Eq. (9)' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Our setting of search and evaluation is consistent with the convention in current studies [30, 48, 9, 49]. Please see the full description of our implementation details in the supplementary material. For our two-stage ISTA-NAS, the super-net is composed of 6 normals cells and 2 reduction cells. Each cell has 6 nodes. The first two nodes are input nodes output from the previous two cells. As convention, each intermediate node keeps two connections after search, so the sparseness sj = 2 in our method. We adopt the Adam optimizer for bj and SGD for network weights W. and The learning rates are also enlarged by the same times as batchsize.