Geometry-Aware Gradient Algorithms for Neural Architecture Search

Authors: Liam Li, Mikhail Khodak, Nina Balcan, Ameet Talwalkar

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental empirically, we show that our solution leads to strong improvements on several NAS benchmarks. Notably, we exceed the best published results for both CIFAR and Image Net on both the DARTS search space and NAS-Bench201; on the latter we achieve near-oracle-optimal performance on CIFAR-10 and CIFAR-100.
Researcher Affiliation Collaboration Liam Li 1, Mikhail Khodak 2, Maria-Florina Balcan2, and Ameet Talwalkar1,2 1 Determined AI, 2 Carnegie Mellon University
Pseudocode Yes Algorithm 1: Block-stochastic mirror descent optimization of a function f : Rd Θ 7 R.
Open Source Code Yes Code to obtain these results has been made available in the supplementary material.
Open Datasets Yes We evaluate GAEA on three different computer vision benchmarks: the large and heavily studied search space from DARTS (Liu et al., 2019) and two smaller oracle evaluation benchmarks, NAS-Bench-1Shot1 (Zela et al., 2020a), and NAS-Bench-201 (Dong & Yang, 2020). CIFAR-10 (Krizhevksy, 2009) and Image Net (Russakovsky et al., 2015).
Dataset Splits Yes On Image Net, we follow Xu et al. (2020) by using subsamples containing 10% and 2.5% of the training images from ILSVRC-2012 (Russakovsky et al., 2015) as training and validation sets, respectively.
Hardware Specification Yes Search cost is hardware-dependent; we used Tesla V100 GPUs. Search cost measured on NVIDIA P100 GPUs.
Software Dependencies No The paper lists general training parameters like 'scheduler: cosine' and 'batch_size', but does not specify software versions for libraries (e.g., PyTorch, TensorFlow) or programming languages (e.g., Python).
Experiment Setup Yes All hyperparameters for training the weight-sharing network are the same as that used by PC-DARTS: train: scheduler: cosine lr_anneal_cycles: 1 smooth_cross_entropy: false batch_size: 256 learning_rate: 0.1 learning_rate_min: 0.0 momentum: 0.9 weight_decay: 0.0003 init_channels: 16 layers: 8 autoaugment: false cutout: false auxiliary: false drop_path_prob: 0 grad_clip: 5