Construction of Hierarchical Neural Architecture Search Spaces based on Context-free Grammars
Authors: Simon Schrodi, Danny Stoll, Binxin Ru, Rhea Sukthanker, Thomas Brox, Frank Hutter
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the versatility of our search space design framework and show that our search strategy can be superior to existing NAS approaches. Code is available at https://github.com/automl/hierarchical_nas_construction. |
| Researcher Affiliation | Academia | 1University of Freiburg 2University of Oxford {schrodi,stolld,sukthank,brox,fh}@cs.uni-freiburg.de robin@robots.ox.ac.uk |
| Pseudocode | Yes | Algorithm 1 Bayesian Optimization algorithm [90]. Input: Initial observed data Dt, a black-box objective function f, total number of BO iterations T Output: The best recommendation about the global optimizer x for t = 1, . . . , T do Select the next xt+1 by maximizing acquisition function α(x|Dt) Evaluate the objective function at ft+1 = f(xt+1) Dt+1 Dt (xt+1, ft+1) Update the surrogate model with Dt+1 end for |
| Open Source Code | Yes | Code is available at https://github.com/automl/hierarchical_nas_construction. |
| Open Datasets | Yes | We evaluated all search strategies on CIFAR-10/100 [93], Image Net-16-120 [94], CIFARTile, and Add NIST [95]. |
| Dataset Splits | Yes | For CIFAR-10, we split the original training set into a new training set with 25k images and validation set with 25k images for search following [58]. |
| Hardware Specification | Yes | All search experiments used 8 asynchronous workers, each with a single NVIDIA RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers (e.g., Python, PyTorch, or specific libraries). |
| Experiment Setup | Yes | For training of architectures on CIFAR-10/100 and Image Net-16-120, we followed the training protocol of Dong and Yang [58]. We trained architectures with SGD with learning rate of 0.1, Nesterov momentum of 0.9, weight decay of 0.0005 with cosine annealing [96], and batch size of 256 for 200 epochs. |