Stronger NAS with Weaker Predictors

Authors: Junru Wu, Xiyang Dai, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Ye Yu, Zhangyang Wang, Zicheng Liu, Mei Chen, Lu Yuan

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Weak NAS costs fewer samples to find top-performance architectures on NAS-Bench-101 and NAS-Bench-201. Compared to state-of-the-art (SOTA) predictor-based NAS methods, Weak NAS outperforms all with notable margins, e.g., requiring at least 7.5x less samples to find global optimal on NAS-Bench-101. Weak NAS can also absorb their ideas to boost performance more. Further, Weak NAS strikes the new SOTA result of 81.3% in the Image Net Mobile Net Search Space.
Researcher Affiliation Collaboration 1 Texas A&M University, 2Microsoft Corporation, 3University of Texas at Austin
Pseudocode No The paper describes steps for its iterative process under "Implementation Outline" in Section 2.2, but this is presented in paragraph form and not as a formal pseudocode or algorithm block.
Open Source Code Yes The code is available at: https://github.com/VITA-Group/Weak NAS.
Open Datasets Yes We used publicly available data and cited the corresponding papers, including CIFAR10[33], CIFAR100[33], Image Net16-120[34], Image Net[37], NAS-Bench-101[32], NASBench-201[31], OFA[36], TIMM Library[39]
Dataset Splits Yes Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See main paper Section 2.4 for predictor setup, and Section 3 for detailed setup in each dataset, and the supplemental material for Image Net training details.
Hardware Specification Yes Setup: For all experiments, we use an Intel Xeon E5-2650v4 CPU and a single Tesla P100 GPU, and use the Multilayer perceptron (MLP) as our default NAS predictor, unless otherwise specified. ... we adopt Py Torch and image models library (timm) [39] to implement our models and conduct all Image Net experiments using 8 Tesla V100 GPUs.
Software Dependencies No The paper mentions software like "Py Torch" and "image models library (timm) [39]" and "XGBoost [29]" but does not provide specific version numbers for these dependencies.
Experiment Setup Yes For our weak predictor, we use a 4-layer MLP with hidden layer dimension of (1000, 1000, 1000, 1000). ... we use the Gradient Boosting Regression Tree (GBRT) based on XGBoost [29], consisting of 1000 Trees. ... we use a random forest consisting of 1000 Forests. ... In Table 1, we initialize the initial Weak Predictor f1 with 100 random samples, and set M = 10, after progressively adding more weak predictors (from 1 to 191)...