SNAS: stochastic neural architecture search
Authors: Sirui Xie, Hehui Zheng, Chunxiao Liu, Liang Lin
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, SNAS shows strong performance compared with DARTS and all other existing NAS methods in terms of test error, model complexity and searching resources. Specifically, SNAS discovers novel convolutional cells achieving 2.85 0.02% test error on CIFAR-10 with only 2.8M parameters, which is better than 3.00 0.14%-3.3M from 1st-order DARTS and 2.89%-4.6M from ENAS. |
| Researcher Affiliation | Industry | Sirui Xie, Hehui Zheng, Chunxiao Liu, Liang Lin Sense Time {xiesirui, zhenghehui, liuchunxiao}@sensetime.com linliang@ieee.org |
| Pseudocode | No | The paper describes methods through mathematical equations and textual descriptions, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions publicly released code by other researchers (Liu et al., 2019 and Pham et al., 2018) but does not provide a statement or link for the open-source code of SNAS itself. |
| Open Datasets | Yes | Dataset CIFAR-10 dataset (Krizhevsky & Hinton, 2009) is a basic dataset for image classification, which consists of 50,000 training images and 10,000 testing images. and The discovered cell achieves 27.3% top-1 error when transferred to Image Net (mobile setting)... |
| Dataset Splits | Yes | Dataset CIFAR-10 dataset (Krizhevsky & Hinton, 2009) is a basic dataset for image classification, which consists of 50,000 training images and 10,000 testing images. Data transformation is achieved by the standard data pre-processing and augmentation techniques (see Appendix G.1). and normalizing the training and validation images by subtracting the channel mean and dividing by the channel standard deviation. |
| Hardware Specification | Yes | All the experiments were performed using NVIDIA TITAN Xp GPUs |
| Software Dependencies | No | The paper mentions 'Py Torch' as the implementation framework but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | The neural operation parameters θ are optimized using momentum SGD, with initial learning rate ηθ = 0.025 (annealed down to zero following a cosine schedule), momentum 0.9, and weight decay 3 10 4. The architecture distribution parameters α are optimized by Adam, with initial learning rate ηα = 3 10 4, momentum β = (0.5, 0.999) and weight decay 10 3. The batch size employed is 64 and the initial number of channels is 16. |