Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
A Study on Encodings for Neural Architecture Search
Authors: Colin White, Willie Neiswanger, Sam Nolen, Yash Savani
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present our experimental results. All of our experiments follow the Best Practices for NAS checklist [9]. We discuss our adherence to these practices in the full version of this paper. In particular, we release our code at https://github.com/naszilla/naszilla. We run experiments on three search spaces which we describe below. The NASBench-101 dataset [24] consists of approximately 423,000 neural architectures pretrained on CIFAR-10. |
| Researcher Affiliation | Collaboration | Colin White Abacus.AI San Francisco, CA 94103 EMAIL Willie Neiswanger Stanford University and Petuum, Inc. Stanford, CA 94305 EMAIL Sam Nolen Abacus.AI San Francisco, CA 94103 EMAIL Yash Savani Abacus.AI San Francisco, CA 94103 EMAIL |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/naszilla/naszilla. |
| Open Datasets | Yes | We run experiments on three search spaces which we describe below. The NASBench-101 dataset [24] consists of approximately 423,000 neural architectures pretrained on CIFAR-10. The NASBench-201 dataset [1] consists of 15625 neural architectures separately trained on each of CIFAR-10, CIFAR-100, and Image Net16-120. The DARTS [10] search space is used for large-scale cell-based NAS experiments on CIFAR-10. |
| Dataset Splits | Yes | We chose the configuration that minimizes the validation loss of the NAS algorithm after 200 queries. We also test the ability of a neural predictor to generalize to new search spaces, using a given encoding. Finally, for encodings in which multiple architectures can map to the same encoding, we evaluate the average standard deviation of accuracies for the equivalence class of architectures defined by each encoding. The neural predictor is trained on 1000 architectures and predicts the validation loss of the 5000 architectures from the test search space. |
| Hardware Specification | Yes | In each experiment, we report the test error of the neural network with the best validation error after time t, for t up to 130 TPU hours. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for its dependencies (e.g., PyTorch version, Python version). |
| Experiment Setup | Yes | Existing NAS algorithms may have hyperparameters that are optimized for a specific encoding, therefore, we perform hyperparameter tuning for each encoding. We just need to be careful that we do not perform hyperparameter tuning for specific datasets (in accordance with NAS best practices [9]). Therefore, we perform the hyperparameter search on CIFAR-100 from NAS-Bench-201, and apply the results on NAS-Bench-101. We defined a search region for each hyperparameter of each algorithm, and then for each encoding, we ran 50 iterations of random search on the full hyperparameter space. We chose the configuration that minimizes the validation loss of the NAS algorithm after 200 queries. |