One-Shot Neural Architecture Search via Novelty Driven Sampling
Authors: Miao Zhang, Huiqi Li, Shirui Pan, Taoping Liu, Steven Su
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate the effectiveness and efficiency of our novelty search based architecture sampling method. Our approach obtains the state-of-the-art test error of 2.51% for CIFAR-10 with only 7.5 hours of search time in a single GPU, and a competitive validation perplexity of 57.83 and a test perplexity of 55.88 on PTB with 4 hours search time. After transferring to larger datasets, our best models achieve a state-of-the-art test error of 16.56% on CIFAR-100, a competitive 26.66% on Image Net, and a validation perplexity of 70.14 and a test perplexity of 69.31 on WT2. Our method also beats baselines on a NAS benchmark dataset. |
| Researcher Affiliation | Academia | Miao Zhang1,2 , Huiqi Li1 , Shirui Pan3 , Taoping Liu2 and Steven Su2 1School of Information and Electronics, Beijing Institute of Technology 2Faculty of Engineering and Information Technology, University of Technology Sydney 3 Faculty of Information Technology, Monash University |
| Pseudocode | Yes | Algorithm 1 presents a simple implementation of EN2AS |
| Open Source Code | Yes | All codes, log files, and also trained models could be found in https://github.com/Miao Zhang0525/ENNAS MASTER. |
| Open Datasets | Yes | We perform our EN2AS on small datasets, CIFAR-10 and PTB, to search for cell architectures on a smaller supernet architecture with fewer cells in the architecture search phase, and stack more multiple cells to construct larger architecture for full training and evaluation. Finally, the best-learned cells are also transferred to CIFAR-100, Image Net and WT2 to investigate the transferability. We also evaluate the supernet predictive ability of our novelty based sampling method compared with two baselines in the following subsections 1. We adopt NAS-Bench-102 [Dong and Yang, 2020] as a benchmark dataset to analyze our approach in this experiment. |
| Dataset Splits | Yes | Input: Training datase Dtrain, validation dataset Dval, test dataset Dtest, randomly initialized W, initial architecture archive A = , maximum number of stored architectures S, batch size b, training iteration T. ... Perform random search or EA on the trained supernet with validation dataset Dval to get α based on Eq.(8). |
| Hardware Specification | No | The paper mentions running experiments on 'a single GPU' but does not specify the GPU model or any other hardware details. |
| Software Dependencies | No | The paper suggests that experimental results can be reproduced by replacing cell structures in DARTS [Liu et al., 2019] but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | All models are trained with 600 (250 for Image Net) epochs, where the batch size is 96, and the initial channel is 36, to obtain the test error. ... All models are trained with 1600 epochs with 64 batch size to obtain the perplexity, and we also further train our best-found architecture with 3600 epochs on to achieve competitive results. ... The embedding and hidden sizes are changed to 700, weight decay to 5 10 7, hidden-node variational dropout to 0.15, and other hyperparameter settings are the same as PTB. |