XNAS: Neural Architecture Search with Expert Advice
Authors: Niv Nayman, Asaf Noy, Tal Ridnik, Itamar Friedman, Rong Jin, Lihi Zelnik
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our algorithm achieves a strong performance over several image classification datasets. Specifically, it obtains an error rate of 1.6% for CIFAR-10, 23.9% for Image Net under mobile settings, and achieves state-of-the-art results on three additional datasets. |
| Researcher Affiliation | Industry | Niv Nayman , Asaf Noy , Tal Ridnik , Itamar Friedman, Rong Jin, Lihi Zelnik-Manor Machine Intelligence Technology, Alibaba Group {niv.nayman,asaf.noy,tal.ridnik,itamar.friedman,jinrong.jr,lihi.zelnik} @alibaba-inc.com |
| Pseudocode | Yes | Algorithm 1 XNAS for a single forecaster |
| Open Source Code | Yes | XNAS evaluation results can be reproduced using the code: https://github.com/NivNayman/XNAS |
| Open Datasets | Yes | We used the CIFAR-10 dataset for the main search and evaluation phase. In addition, using the cell found on CIFAR-10 we did transferability experiments on the well-known benchmarks Image Net, CIFAR-100, SVHN, Fashion-MNIST, Freiburg and CINIC10. |
| Dataset Splits | Yes | The train set is divided into two parts of equal sizes: one is used for training the operations weights ω and the other for training the architecture weights v, both with respect to the cross entropy loss. With a batch size of 96, one epoch takes 8.5 minutes in average on a single GPU2 , summing up to 7 hours in total for a single search. For example, for CIFAR10 with 50%:50% train-validation split, 50 search epochs... |
| Hardware Specification | Yes | Experiments were performed using a NVIDIA GTX 1080Ti GPU. |
| Software Dependencies | No | The paper mentions optimizers like SGD with nesterov-momentum and Adam, but does not provide specific version numbers for software libraries or environments (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | The search phase lasts up to 50 epochs. We use the first-order approximation [25], relating to v and ω as independent parameters which can be optimized separately. The train set is divided into two parts of equal sizes: one is used for training the operations weights ω and the other for training the architecture weights v, both with respect to the cross entropy loss. With a batch size of 96, one epoch takes 8.5 minutes in average on a single GPU2 , summing up to 7 hours in total for a single search. We trained the network for 1500 epochs using a batch size of 96 and SGD optimizer with nesterov-momentum. Our learning rate regime was composed of 5 cycles of power cosine annealing learning rate [17], with amplitude decay factor of 0.5 per cycle. For regularization we used cutout [9], scheduled drop-path [22], auxiliary towers [39], label smoothing [40] Auto Augment [7] and weight decay. |