Geometry-Aware Gradient Algorithms for Neural Architecture Search
Authors: Liam Li, Mikhail Khodak, Nina Balcan, Ameet Talwalkar
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | empirically, we show that our solution leads to strong improvements on several NAS benchmarks. Notably, we exceed the best published results for both CIFAR and Image Net on both the DARTS search space and NAS-Bench201; on the latter we achieve near-oracle-optimal performance on CIFAR-10 and CIFAR-100. |
| Researcher Affiliation | Collaboration | Liam Li 1, Mikhail Khodak 2, Maria-Florina Balcan2, and Ameet Talwalkar1,2 1 Determined AI, 2 Carnegie Mellon University |
| Pseudocode | Yes | Algorithm 1: Block-stochastic mirror descent optimization of a function f : Rd Θ 7 R. |
| Open Source Code | Yes | Code to obtain these results has been made available in the supplementary material. |
| Open Datasets | Yes | We evaluate GAEA on three different computer vision benchmarks: the large and heavily studied search space from DARTS (Liu et al., 2019) and two smaller oracle evaluation benchmarks, NAS-Bench-1Shot1 (Zela et al., 2020a), and NAS-Bench-201 (Dong & Yang, 2020). CIFAR-10 (Krizhevksy, 2009) and Image Net (Russakovsky et al., 2015). |
| Dataset Splits | Yes | On Image Net, we follow Xu et al. (2020) by using subsamples containing 10% and 2.5% of the training images from ILSVRC-2012 (Russakovsky et al., 2015) as training and validation sets, respectively. |
| Hardware Specification | Yes | Search cost is hardware-dependent; we used Tesla V100 GPUs. Search cost measured on NVIDIA P100 GPUs. |
| Software Dependencies | No | The paper lists general training parameters like 'scheduler: cosine' and 'batch_size', but does not specify software versions for libraries (e.g., PyTorch, TensorFlow) or programming languages (e.g., Python). |
| Experiment Setup | Yes | All hyperparameters for training the weight-sharing network are the same as that used by PC-DARTS: train: scheduler: cosine lr_anneal_cycles: 1 smooth_cross_entropy: false batch_size: 256 learning_rate: 0.1 learning_rate_min: 0.0 momentum: 0.9 weight_decay: 0.0003 init_channels: 16 layers: 8 autoaugment: false cutout: false auxiliary: false drop_path_prob: 0 grad_clip: 5 |