reproducibilityindex.ai

Geometry-Aware Gradient Algorithms for Neural Architecture Search

Authors: Liam Li, Mikhail Khodak, Nina Balcan, Ameet Talwalkar

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	empirically, we show that our solution leads to strong improvements on several NAS benchmarks. Notably, we exceed the best published results for both CIFAR and Image Net on both the DARTS search space and NAS-Bench201; on the latter we achieve near-oracle-optimal performance on CIFAR-10 and CIFAR-100.
Researcher Affiliation	Collaboration	Liam Li 1, Mikhail Khodak 2, Maria-Florina Balcan2, and Ameet Talwalkar1,2 1 Determined AI, 2 Carnegie Mellon University
Pseudocode	Yes	Algorithm 1: Block-stochastic mirror descent optimization of a function f : Rd Θ 7 R.
Open Source Code	Yes	Code to obtain these results has been made available in the supplementary material.
Open Datasets	Yes	We evaluate GAEA on three different computer vision benchmarks: the large and heavily studied search space from DARTS (Liu et al., 2019) and two smaller oracle evaluation benchmarks, NAS-Bench-1Shot1 (Zela et al., 2020a), and NAS-Bench-201 (Dong & Yang, 2020). CIFAR-10 (Krizhevksy, 2009) and Image Net (Russakovsky et al., 2015).
Dataset Splits	Yes	On Image Net, we follow Xu et al. (2020) by using subsamples containing 10% and 2.5% of the training images from ILSVRC-2012 (Russakovsky et al., 2015) as training and validation sets, respectively.
Hardware Specification	Yes	Search cost is hardware-dependent; we used Tesla V100 GPUs. Search cost measured on NVIDIA P100 GPUs.
Software Dependencies	No	The paper lists general training parameters like 'scheduler: cosine' and 'batch_size', but does not specify software versions for libraries (e.g., PyTorch, TensorFlow) or programming languages (e.g., Python).
Experiment Setup	Yes	All hyperparameters for training the weight-sharing network are the same as that used by PC-DARTS: train: scheduler: cosine lr_anneal_cycles: 1 smooth_cross_entropy: false batch_size: 256 learning_rate: 0.1 learning_rate_min: 0.0 momentum: 0.9 weight_decay: 0.0003 init_channels: 16 layers: 8 autoaugment: false cutout: false auxiliary: false drop_path_prob: 0 grad_clip: 5