Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search
Authors: Youhei Akimoto, Shinichi Shirakawa, Nozomu Yoshinari, Kento Uchida, Shota Saito, Kouhei Nishida
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Despite its simplicity and no problem-dependent parameter tuning, our method exhibited near state-of-the-art performances with low computational budgets both on image classification and inpainting tasks. |
| Researcher Affiliation | Collaboration | 1University of Tsukuba & RIKEN AIP 2Yokohama National University 3Skill Up AI Co., Ltd. 4Shinshu University. |
| Pseudocode | Yes | Algorithm 1 ASNG-NAS |
| Open Source Code | Yes | The code is available at https://github.com/shirakawas/ASNG-NAS. |
| Open Datasets | Yes | We use the CIFAR-10 dataset and adopt the standard preprocessing and data augmentation as done in the previous works, e.g., Liu et al. (2019); Pham et al. (2018). We use the Celeb Faces Attributes Dataset (Celeb A) (Liu et al., 2015). |
| Dataset Splits | Yes | During the architecture search, we split the training dataset into halves as D = {Dx, Dθ} as done in Liu et al. (2019). |
| Hardware Specification | Yes | The experiments were done with a single NVIDIA GTX 1080Ti GPU |
| Software Dependencies | Yes | ASNG-NAS is implemented using Py Torch 0.4.1 (Paszke et al., 2017). |
| Experiment Setup | Yes | In the architecture search phase, we optimize x and θ for 100 epochs (about 40K iterations) with a mini-batch size of 64. We use SGD with a momentum of 0.9 to optimize weights x. The step-size ϵx changes from 0.025 to 0 following the cosine schedule (Loshchilov & Hutter, 2017). After the architecture search phase, we retrain the network with the most likely architecture, ˆc = argmaxc pθ(c), from scratch, which is a commonly used technique (Brock et al., 2018; Liu et al., 2019; Pham et al., 2018) to improve final performance. In the retraining stage, we can exclude the redundant (unused) weights. Then, we optimize x for 600 epochs with a mini-batch size of 80. |