Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Neural Architecture Search: A Survey
Authors: Thomas Elsken, Jan Hendrik Metzen, Frank Hutter
JMLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide an overview of existing work in this field of research and categorize them according to three dimensions: search space, search strategy, and performance estimation strategy. ... Already by now, NAS methods have outperformed manually designed architectures on some tasks such as image classification (Zoph et al., 2018; Real et al., 2019), object detection (Zoph et al., 2018) or semantic segmentation (Chen et al., 2018). ... Real et al. (2019) conduct a case study comparing RL, evolution, and random search (RS), concluding that RL and evolution perform equally well in terms of final test accuracy, with evolution having better anytime performance and finding smaller models. Both approaches consistently perform better than RS in their experiments, but with a rather small margin: RS achieved test errors of approximately 4% on CIFAR-10, while RL and evolution reached approximately 3.5% (after model augmentation where depth and number of filters was increased; the difference on the non-augmented space actually used for the search was approx. 2%). |
| Researcher Affiliation | Collaboration | Thomas Elsken EMAIL Bosch Center for Artificial Intelligence 71272 Renningen, Germany and University of Freiburg. Jan Hendrik Metzen Jan EMAIL Bosch Center for Artificial Intelligence 71272 Renningen, Germany. Frank Hutter EMAIL University of Freiburg 79110 Freiburg, Germany. |
| Pseudocode | No | The paper provides descriptive text and figures to illustrate concepts, but does not contain any clearly labeled pseudocode or algorithm blocks for its own methodology or for the methods it surveys. |
| Open Source Code | No | The paper is a survey and describes various methodologies, but does not provide any concrete access to source code for the methodology described in this paper by its authors. It mentions code in the context of other works (e.g., 'open-source Auto ML system'), but not for its own contribution. |
| Open Datasets | Yes | Already by now, NAS methods have outperformed manually designed architectures on some tasks such as image classification (Zoph et al., 2018; Real et al., 2019), object detection (Zoph et al., 2018) or semantic segmentation (Chen et al., 2018). ... Real et al. (2019) conduct a case study comparing RL, evolution, and random search (RS), concluding that RL and evolution perform equally well in terms of final test accuracy, with evolution having better anytime performance and finding smaller models. Both approaches consistently perform better than RS in their experiments, but with a rather small margin: RS achieved test errors of approximately 4% on CIFAR-10, while RL and evolution reached approximately 3.5%... The difference was even smaller for Liu et al. (2018b), who reported a test error of 3.9% on CIFAR-10 and a top-1 validation error of 21.0% on Image Net for RS, compared to 3.75% and 20.3% for their evolution-based method, respectively. ... While most authors report results on the CIFAR-10 data set... ... for optimizing recurrent neural networks (Greffet al., 2015; Jozefowicz et al., 2015; Zoph and Le, 2017; Rawal and Miikkulainen, 2018), e.g., for language or music modeling. |
| Dataset Splits | No | The paper discusses various datasets used in the reviewed literature (e.g., CIFAR-10, ImageNet, Penn Treebank) and mentions that 'measurements of an architecture’s performance depend on many factors other than the architecture itself. While most authors report results on the CIFAR-10 data set, experiments often differ with regard to search space, computational budget, data augmentation, training procedures, regularization, and other factors.' However, it does not explicitly provide specific dataset split information for any experiments conducted by the authors of this survey paper, nor does it detail standard splits for the mentioned public datasets. |
| Hardware Specification | No | The paper is a survey and describes the hardware usage of other works (e.g., '800 GPUs for three to four weeks', 'computational demands in the order of thousands of GPU days for NAS'), but it does not specify any hardware details for running its own experiments. |
| Software Dependencies | No | The paper discusses various algorithms and approaches (e.g., 'REINFORCE policy gradient algorithm', 'Proximal Policy Optimization', 'Q-learning', 'Gaussian processes'), but it does not provide specific software names with version numbers for any ancillary software dependencies. |
| Experiment Setup | No | The paper is a survey and discusses experimental setups of other research (e.g., 'a cosine annealing learning rate schedule', 'data augmentation by Cut Out', 'training for fewer epochs'), but it does not provide specific experimental setup details, hyperparameters, or training configurations for its own work. |