Semi-Supervised Neural Architecture Search

Authors: Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, Tie-Yan Liu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On NASBench-101 benchmark dataset, it achieves comparable accuracy with gradientbased method while using only 1/7 architecture-accuracy pairs. 2) It achieves higher accuracy under the same computational cost. It achieves 94.02% test accuracy on NASBench-101, outperforming all the baselines when using the same number of architectures. On Image Net, it achieves 23.5% top-1 error rate (under 600M FLOPS constraint) using 4 GPU-days for search. We further apply it to LJSpeech text to speech task and it achieves 97% intelligibility rate in the low-resource setting and 15% test error rate in the robustness setting, with 9%, 7% improvements over the baseline respectively.
Researcher Affiliation Collaboration 1University of Science and Technology of China, Hefei, China 2Microsoft Research Asia, Beijing, China
Pseudocode Yes Algorithm 1 Semi-Supervised Neural Architecture Search
Open Source Code No The paper provides links to open-source code for baseline methods (NAO, Proxyless NAS) but does not explicitly state that the code for their proposed method (Semi NAS) is open-source or provide a link for it.
Open Datasets Yes Dataset NASBench-101 [37] designs a cell-based search space following the common practice [42, 17, 15]. It includes 423, 624 CNN architectures and trains each architecture CIFAR-10 for 3 times. [...] We conduct experiments on the LJSpeech dataset [10] which contains 13100 text and speech data pairs with approximately 24 hours of speech audio.
Dataset Splits Yes We randomly sample 50, 000 images from the training data as valid set for architecture search.
Hardware Specification Yes The search runs for 1 day on 4 V100 GPUs. [...] The search runs for 1 day on 4 P40 GPUs.
Software Dependencies No The paper mentions software components like Adam optimizer, SGD optimizer, and LSTM networks but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup Yes We use Adam optimizer with a learning rate of 0.001. [...] We train the supernet on 4 GPUs for 20000 steps with a batch size of 128 per card. [...] The discovered architecture is trained for 300 epochs with a total batch size of 256. We use the SGD optimizer with an initial learning rate of 0.05 and a cosine learning rate schedule [16].