Construction of Hierarchical Neural Architecture Search Spaces based on Context-free Grammars

Authors: Simon Schrodi, Danny Stoll, Binxin Ru, Rhea Sukthanker, Thomas Brox, Frank Hutter

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the versatility of our search space design framework and show that our search strategy can be superior to existing NAS approaches. Code is available at https://github.com/automl/hierarchical_nas_construction.
Researcher Affiliation Academia 1University of Freiburg 2University of Oxford {schrodi,stolld,sukthank,brox,fh}@cs.uni-freiburg.de robin@robots.ox.ac.uk
Pseudocode Yes Algorithm 1 Bayesian Optimization algorithm [90]. Input: Initial observed data Dt, a black-box objective function f, total number of BO iterations T Output: The best recommendation about the global optimizer x for t = 1, . . . , T do Select the next xt+1 by maximizing acquisition function α(x|Dt) Evaluate the objective function at ft+1 = f(xt+1) Dt+1 Dt (xt+1, ft+1) Update the surrogate model with Dt+1 end for
Open Source Code Yes Code is available at https://github.com/automl/hierarchical_nas_construction.
Open Datasets Yes We evaluated all search strategies on CIFAR-10/100 [93], Image Net-16-120 [94], CIFARTile, and Add NIST [95].
Dataset Splits Yes For CIFAR-10, we split the original training set into a new training set with 25k images and validation set with 25k images for search following [58].
Hardware Specification Yes All search experiments used 8 asynchronous workers, each with a single NVIDIA RTX 2080 Ti GPU.
Software Dependencies No The paper does not explicitly list specific software dependencies with version numbers (e.g., Python, PyTorch, or specific libraries).
Experiment Setup Yes For training of architectures on CIFAR-10/100 and Image Net-16-120, we followed the training protocol of Dong and Yang [58]. We trained architectures with SGD with learning rate of 0.1, Nesterov momentum of 0.9, weight decay of 0.0005 with cosine annealing [96], and batch size of 256 for 200 epochs.