Semi-Supervised Neural Architecture Search
Authors: Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, Tie-Yan Liu
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On NASBench-101 benchmark dataset, it achieves comparable accuracy with gradientbased method while using only 1/7 architecture-accuracy pairs. 2) It achieves higher accuracy under the same computational cost. It achieves 94.02% test accuracy on NASBench-101, outperforming all the baselines when using the same number of architectures. On Image Net, it achieves 23.5% top-1 error rate (under 600M FLOPS constraint) using 4 GPU-days for search. We further apply it to LJSpeech text to speech task and it achieves 97% intelligibility rate in the low-resource setting and 15% test error rate in the robustness setting, with 9%, 7% improvements over the baseline respectively. |
| Researcher Affiliation | Collaboration | 1University of Science and Technology of China, Hefei, China 2Microsoft Research Asia, Beijing, China |
| Pseudocode | Yes | Algorithm 1 Semi-Supervised Neural Architecture Search |
| Open Source Code | No | The paper provides links to open-source code for baseline methods (NAO, Proxyless NAS) but does not explicitly state that the code for their proposed method (Semi NAS) is open-source or provide a link for it. |
| Open Datasets | Yes | Dataset NASBench-101 [37] designs a cell-based search space following the common practice [42, 17, 15]. It includes 423, 624 CNN architectures and trains each architecture CIFAR-10 for 3 times. [...] We conduct experiments on the LJSpeech dataset [10] which contains 13100 text and speech data pairs with approximately 24 hours of speech audio. |
| Dataset Splits | Yes | We randomly sample 50, 000 images from the training data as valid set for architecture search. |
| Hardware Specification | Yes | The search runs for 1 day on 4 V100 GPUs. [...] The search runs for 1 day on 4 P40 GPUs. |
| Software Dependencies | No | The paper mentions software components like Adam optimizer, SGD optimizer, and LSTM networks but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | We use Adam optimizer with a learning rate of 0.001. [...] We train the supernet on 4 GPUs for 20000 steps with a batch size of 128 per card. [...] The discovered architecture is trained for 300 epochs with a total batch size of 256. We use the SGD optimizer with an initial learning rate of 0.05 and a cosine learning rate schedule [16]. |