Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search

Authors: Youhei Akimoto, Shinichi Shirakawa, Nozomu Yoshinari, Kento Uchida, Shota Saito, Kouhei Nishida

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Despite its simplicity and no problem-dependent parameter tuning, our method exhibited near state-of-the-art performances with low computational budgets both on image classification and inpainting tasks.
Researcher Affiliation Collaboration 1University of Tsukuba & RIKEN AIP 2Yokohama National University 3Skill Up AI Co., Ltd. 4Shinshu University.
Pseudocode Yes Algorithm 1 ASNG-NAS
Open Source Code Yes The code is available at https://github.com/shirakawas/ASNG-NAS.
Open Datasets Yes We use the CIFAR-10 dataset and adopt the standard preprocessing and data augmentation as done in the previous works, e.g., Liu et al. (2019); Pham et al. (2018). We use the Celeb Faces Attributes Dataset (Celeb A) (Liu et al., 2015).
Dataset Splits Yes During the architecture search, we split the training dataset into halves as D = {Dx, Dθ} as done in Liu et al. (2019).
Hardware Specification Yes The experiments were done with a single NVIDIA GTX 1080Ti GPU
Software Dependencies Yes ASNG-NAS is implemented using Py Torch 0.4.1 (Paszke et al., 2017).
Experiment Setup Yes In the architecture search phase, we optimize x and θ for 100 epochs (about 40K iterations) with a mini-batch size of 64. We use SGD with a momentum of 0.9 to optimize weights x. The step-size ϵx changes from 0.025 to 0 following the cosine schedule (Loshchilov & Hutter, 2017). After the architecture search phase, we retrain the network with the most likely architecture, ˆc = argmaxc pθ(c), from scratch, which is a commonly used technique (Brock et al., 2018; Liu et al., 2019; Pham et al., 2018) to improve final performance. In the retraining stage, we can exclude the redundant (unused) weights. Then, we optimize x for 600 epochs with a mini-batch size of 80.