Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

NAS-Bench-Suite: NAS Evaluation is (Now) Surprisingly Easy

Authors: Yash Mehta, Colin White, Arber Zela, Arjun Krishnakumar, Guri Zabergja, Shakiba Moradian, Mahmoud Safari, Kaicheng Yu, Frank Hutter

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we present an in-depth analysis of popular NAS algorithms and performance prediction methods across 25 different combinations of search spaces and datasets, finding that many conclusions drawn from a few NAS benchmarks do not generalize to other benchmarks.
Researcher Affiliation Collaboration 1 University of Freiburg, 2 Abacus.AI, 3 Bosch Center for AI
Pseudocode Yes Snippet 1: A minimal example on how one can run a NAS algorithm in NAS-Bench-Suite. Both the search space and the algorithm can be changed in one line of code.
Open Source Code Yes Our code is available at https://github.com/automl/naslib.
Open Datasets Yes This benchmark consists of 423 624 architectures trained on CIFAR-10. ... The search space consists of 8 242 architectures trained on the TIMIT dataset. ... evaluated across four datasets: Image Net50-1000, Cityscapes, KITTI, and HMDB51.
Dataset Splits Yes NAS-Bench-101 comes with precomputed validation and test accuracies at epochs 4, 12, 36, and 108 from training on CIFAR-10. ... Each architecture has precomputed train, validation, and test losses and accuracies for 200 epochs on CIFAR-10, CIFAR-100, and Image Net-16-120.
Hardware Specification No This is in contrast to the NAS-Bench-Suite, where NAS algorithms take at most 5 minutes on a CPU due to the use of queryable benchmarks. (Mentioned CPU and GPU but no specific models or configurations).
Software Dependencies No A search space is defined with a graph object using Py Torch and Network X (Hagberg et al., 2008)... We use the original implementation from the pybnn package. ... We use the Scikit-learn implementation (Pedregosa et al., 2011). ... We used the original code (Chen & Guestrin, 2016). (Software names are mentioned, but no specific version numbers are provided for PyTorch, NetworkX, pybnn, Scikit-learn, or XGBoost.)
Experiment Setup No For a list of the default hyperparameters and hyperparameter ranges, see https://github.com/automl/NASLib. (Specific hyperparameter values are explicitly deferred to an external link, not provided in the main text of the paper.)