Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search
Authors: Youhei Akimoto, Shinichi Shirakawa, Nozomu Yoshinari, Kento Uchida, Shota Saito, Kouhei Nishida
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Despite its simplicity and no problem-dependent parameter tuning, our method exhibited near state-of-the-art performances with low computational budgets both on image classification and inpainting tasks. |
| Researcher Affiliation | Collaboration | 1University of Tsukuba & RIKEN AIP 2Yokohama National University 3Skill Up AI Co., Ltd. 4Shinshu University. |
| Pseudocode | Yes | Algorithm 1 ASNG-NAS |
| Open Source Code | Yes | The code is available at https://github.com/shirakawas/ASNG-NAS. |
| Open Datasets | Yes | We use the CIFAR-10 dataset and adopt the standard preprocessing and data augmentation as done in the previous works, e.g., Liu et al. (2019); Pham et al. (2018). We use the Celeb Faces Attributes Dataset (Celeb A) (Liu et al., 2015). |
| Dataset Splits | Yes | During the architecture search, we split the training dataset into halves as D = {Dx, Dθ} as done in Liu et al. (2019). |
| Hardware Specification | Yes | The experiments were done with a single NVIDIA GTX 1080Ti GPU |
| Software Dependencies | Yes | ASNG-NAS is implemented using Py Torch 0.4.1 (Paszke et al., 2017). |
| Experiment Setup | Yes | In the architecture search phase, we optimize x and θ for 100 epochs (about 40K iterations) with a mini-batch size of 64. We use SGD with a momentum of 0.9 to optimize weights x. The step-size ϵx changes from 0.025 to 0 following the cosine schedule (Loshchilov & Hutter, 2017). After the architecture search phase, we retrain the network with the most likely architecture, ˆc = argmaxc pθ(c), from scratch, which is a commonly used technique (Brock et al., 2018; Liu et al., 2019; Pham et al., 2018) to improve final performance. In the retraining stage, we can exclude the redundant (unused) weights. Then, we optimize x for 600 epochs with a mini-batch size of 80. |