Stronger NAS with Weaker Predictors
Authors: Junru Wu, Xiyang Dai, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Ye Yu, Zhangyang Wang, Zicheng Liu, Mei Chen, Lu Yuan
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Weak NAS costs fewer samples to find top-performance architectures on NAS-Bench-101 and NAS-Bench-201. Compared to state-of-the-art (SOTA) predictor-based NAS methods, Weak NAS outperforms all with notable margins, e.g., requiring at least 7.5x less samples to find global optimal on NAS-Bench-101. Weak NAS can also absorb their ideas to boost performance more. Further, Weak NAS strikes the new SOTA result of 81.3% in the Image Net Mobile Net Search Space. |
| Researcher Affiliation | Collaboration | 1 Texas A&M University, 2Microsoft Corporation, 3University of Texas at Austin |
| Pseudocode | No | The paper describes steps for its iterative process under "Implementation Outline" in Section 2.2, but this is presented in paragraph form and not as a formal pseudocode or algorithm block. |
| Open Source Code | Yes | The code is available at: https://github.com/VITA-Group/Weak NAS. |
| Open Datasets | Yes | We used publicly available data and cited the corresponding papers, including CIFAR10[33], CIFAR100[33], Image Net16-120[34], Image Net[37], NAS-Bench-101[32], NASBench-201[31], OFA[36], TIMM Library[39] |
| Dataset Splits | Yes | Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See main paper Section 2.4 for predictor setup, and Section 3 for detailed setup in each dataset, and the supplemental material for Image Net training details. |
| Hardware Specification | Yes | Setup: For all experiments, we use an Intel Xeon E5-2650v4 CPU and a single Tesla P100 GPU, and use the Multilayer perceptron (MLP) as our default NAS predictor, unless otherwise speciļ¬ed. ... we adopt Py Torch and image models library (timm) [39] to implement our models and conduct all Image Net experiments using 8 Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions software like "Py Torch" and "image models library (timm) [39]" and "XGBoost [29]" but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | For our weak predictor, we use a 4-layer MLP with hidden layer dimension of (1000, 1000, 1000, 1000). ... we use the Gradient Boosting Regression Tree (GBRT) based on XGBoost [29], consisting of 1000 Trees. ... we use a random forest consisting of 1000 Forests. ... In Table 1, we initialize the initial Weak Predictor f1 with 100 random samples, and set M = 10, after progressively adding more weak predictors (from 1 to 191)... |