reproducibilityindex.ai

XNAS: Neural Architecture Search with Expert Advice

Authors: Niv Nayman, Asaf Noy, Tal Ridnik, Itamar Friedman, Rong Jin, Lihi Zelnik

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our algorithm achieves a strong performance over several image classiﬁcation datasets. Speciﬁcally, it obtains an error rate of 1.6% for CIFAR-10, 23.9% for Image Net under mobile settings, and achieves state-of-the-art results on three additional datasets.
Researcher Affiliation	Industry	Niv Nayman , Asaf Noy , Tal Ridnik , Itamar Friedman, Rong Jin, Lihi Zelnik-Manor Machine Intelligence Technology, Alibaba Group {niv.nayman,asaf.noy,tal.ridnik,itamar.friedman,jinrong.jr,lihi.zelnik} @alibaba-inc.com
Pseudocode	Yes	Algorithm 1 XNAS for a single forecaster
Open Source Code	Yes	XNAS evaluation results can be reproduced using the code: https://github.com/NivNayman/XNAS
Open Datasets	Yes	We used the CIFAR-10 dataset for the main search and evaluation phase. In addition, using the cell found on CIFAR-10 we did transferability experiments on the well-known benchmarks Image Net, CIFAR-100, SVHN, Fashion-MNIST, Freiburg and CINIC10.
Dataset Splits	Yes	The train set is divided into two parts of equal sizes: one is used for training the operations weights ω and the other for training the architecture weights v, both with respect to the cross entropy loss. With a batch size of 96, one epoch takes 8.5 minutes in average on a single GPU2 , summing up to 7 hours in total for a single search. For example, for CIFAR10 with 50%:50% train-validation split, 50 search epochs...
Hardware Specification	Yes	Experiments were performed using a NVIDIA GTX 1080Ti GPU.
Software Dependencies	No	The paper mentions optimizers like SGD with nesterov-momentum and Adam, but does not provide specific version numbers for software libraries or environments (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	The search phase lasts up to 50 epochs. We use the ﬁrst-order approximation [25], relating to v and ω as independent parameters which can be optimized separately. The train set is divided into two parts of equal sizes: one is used for training the operations weights ω and the other for training the architecture weights v, both with respect to the cross entropy loss. With a batch size of 96, one epoch takes 8.5 minutes in average on a single GPU2 , summing up to 7 hours in total for a single search. We trained the network for 1500 epochs using a batch size of 96 and SGD optimizer with nesterov-momentum. Our learning rate regime was composed of 5 cycles of power cosine annealing learning rate [17], with amplitude decay factor of 0.5 per cycle. For regularization we used cutout [9], scheduled drop-path [22], auxiliary towers [39], label smoothing [40] Auto Augment [7] and weight decay.