reproducibilityindex.ai

Evaluating Efficient Performance Estimators of Neural Architectures

Authors: Xuefei Ning, Changcheng Tang, Wenshuo Li, Zixuan Zhou, Shuang Liang, Huazhong Yang, Yu Wang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we conduct an extensive and organized assessment of OSEs and ZSEs on five NAS benchmarks: NAS-Bench-101/201/301, and NDS Res Net/Res Ne Xt-A. Specifically, we employ a set of NAS-oriented criteria to study the behavior of OSEs and ZSEs, and reveal their biases and variances.
Researcher Affiliation	Collaboration	Department of Electronic Engineering, Tsinghua University1 Novauto Technology Co. Ltd.2
Pseudocode	No	The paper does not contain any sections explicitly labeled as 'Pseudocode' or 'Algorithm', nor are there structured, code-like blocks describing a procedure.
Open Source Code	Yes	The code is available at https://github. com/walkerning/aw_nas [24].
Open Datasets	Yes	In this paper, we conduct an extensive and organized assessment of OSEs and ZSEs on five NAS benchmarks: NAS-Bench-101/201/301, and NDS Res Net/Res Ne Xt-A.
Dataset Splits	Yes	We inspect OSEs ranking quality when using different numbers of validation data batches to evaluate the OS scores, and find that on both NB201/NB301, using more data improves the estimation quality. Specifically, we compute the average OS accuracies over N validation batches, where each batch contains 128 examples.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Unless otherwise noted, MC sample S=1 is used in the experiments. And all training and evaluation settings are summarized in Appendix D. ... Speciﬁcally, we compute the average OS accuracies over N validation batches, where each batch contains 128 examples.