Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
XNAS: Neural Architecture Search with Expert Advice
Authors: Niv Nayman, Asaf Noy, Tal Ridnik, Itamar Friedman, Rong Jin, Lihi Zelnik
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our algorithm achieves a strong performance over several image classification datasets. Specifically, it obtains an error rate of 1.6% for CIFAR-10, 23.9% for Image Net under mobile settings, and achieves state-of-the-art results on three additional datasets. |
| Researcher Affiliation | Industry | Niv Nayman , Asaf Noy , Tal Ridnik , Itamar Friedman, Rong Jin, Lihi Zelnik-Manor Machine Intelligence Technology, Alibaba Group {niv.nayman,asaf.noy,tal.ridnik,itamar.friedman,jinrong.jr,lihi.zelnik} @alibaba-inc.com |
| Pseudocode | Yes | Algorithm 1 XNAS for a single forecaster |
| Open Source Code | Yes | XNAS evaluation results can be reproduced using the code: https://github.com/NivNayman/XNAS |
| Open Datasets | Yes | We used the CIFAR-10 dataset for the main search and evaluation phase. In addition, using the cell found on CIFAR-10 we did transferability experiments on the well-known benchmarks Image Net, CIFAR-100, SVHN, Fashion-MNIST, Freiburg and CINIC10. |
| Dataset Splits | Yes | The train set is divided into two parts of equal sizes: one is used for training the operations weights ω and the other for training the architecture weights v, both with respect to the cross entropy loss. With a batch size of 96, one epoch takes 8.5 minutes in average on a single GPU2 , summing up to 7 hours in total for a single search. For example, for CIFAR10 with 50%:50% train-validation split, 50 search epochs... |
| Hardware Specification | Yes | Experiments were performed using a NVIDIA GTX 1080Ti GPU. |
| Software Dependencies | No | The paper mentions optimizers like SGD with nesterov-momentum and Adam, but does not provide specific version numbers for software libraries or environments (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | The search phase lasts up to 50 epochs. We use the first-order approximation [25], relating to v and ω as independent parameters which can be optimized separately. The train set is divided into two parts of equal sizes: one is used for training the operations weights ω and the other for training the architecture weights v, both with respect to the cross entropy loss. With a batch size of 96, one epoch takes 8.5 minutes in average on a single GPU2 , summing up to 7 hours in total for a single search. We trained the network for 1500 epochs using a batch size of 96 and SGD optimizer with nesterov-momentum. Our learning rate regime was composed of 5 cycles of power cosine annealing learning rate [17], with amplitude decay factor of 0.5 per cycle. For regularization we used cutout [9], scheduled drop-path [22], auxiliary towers [39], label smoothing [40] Auto Augment [7] and weight decay. |