Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Evaluating Efficient Performance Estimators of Neural Architectures
Authors: Xuefei Ning, Changcheng Tang, Wenshuo Li, Zixuan Zhou, Shuang Liang, Huazhong Yang, Yu Wang
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we conduct an extensive and organized assessment of OSEs and ZSEs on five NAS benchmarks: NAS-Bench-101/201/301, and NDS Res Net/Res Ne Xt-A. Specifically, we employ a set of NAS-oriented criteria to study the behavior of OSEs and ZSEs, and reveal their biases and variances. |
| Researcher Affiliation | Collaboration | Department of Electronic Engineering, Tsinghua University1 Novauto Technology Co. Ltd.2 |
| Pseudocode | No | The paper does not contain any sections explicitly labeled as 'Pseudocode' or 'Algorithm', nor are there structured, code-like blocks describing a procedure. |
| Open Source Code | Yes | The code is available at https://github. com/walkerning/aw_nas [24]. |
| Open Datasets | Yes | In this paper, we conduct an extensive and organized assessment of OSEs and ZSEs on five NAS benchmarks: NAS-Bench-101/201/301, and NDS Res Net/Res Ne Xt-A. |
| Dataset Splits | Yes | We inspect OSEs ranking quality when using different numbers of validation data batches to evaluate the OS scores, and find that on both NB201/NB301, using more data improves the estimation quality. Specifically, we compute the average OS accuracies over N validation batches, where each batch contains 128 examples. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Unless otherwise noted, MC sample S=1 is used in the experiments. And all training and evaluation settings are summarized in Appendix D. ... Specifically, we compute the average OS accuracies over N validation batches, where each batch contains 128 examples. |