Neural Architecture Search as Sparse Supernet

Authors: Yan Wu, Aoming Liu, Zhiwu Huang, Siwei Zhang, Luc Van Gool10379-10387

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on Convolutional Neural Network and Recurrent Neural Network search demonstrate that the proposed method is capable of searching for compact, general and powerful neural architectures.
Researcher Affiliation Academia 1Computer Vision Lab, ETH Z urich, Switzerland 2VISICS, KU Leuven, Belgium
Pseudocode Yes Algorithm 1: Bi-level Optimization with the Proposed Hierarchical Accelerated Proximal Gradient (HAPG) Algorithm
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the methodology described in the paper. The provided link is for the supplementary material PDF, not code.
Open Datasets Yes We evaluate the proposed Sparse NAS for Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) architecture search on CIFAR-10 and Penn Treebank (PTB) respectively, and further investigate the transferability of searched architectures on CIFAR-10 to CIFAR-100 and Image Net. The references section contains citations for these datasets (e.g., [Merity, Keskar, and Socher 2018] for PTB, [Howard et al. 2017] for MobileNet).
Dataset Splits Yes where the network weights w and the architecture weights A are optimized on two separate training and validation sets to avoid architecture from overfitting to data. In both CNN and RNN cell search experiments, we follow the setup of DARTS (Liu, Simonyan, and Yang 2018) to implement Sparse NAS, where we use the same search space, cell setup and we stack the same number of cells for fair comparison.
Hardware Specification No The paper mentions 'GPU days' for search cost and 'an Amazon AWS grant, and an Nvidia GPU grant', implying the use of GPUs and cloud resources. However, it does not provide specific hardware details such as exact GPU/CPU models, memory amounts, or detailed computer specifications.
Software Dependencies No The paper references algorithms like Adam and frameworks like DARTS but does not provide specific software dependency versions (e.g., Python 3.x, PyTorch 1.x, CUDA 10.x).
Experiment Setup Yes where the network weights w and the architecture weights A are optimized on two separate training and validation sets to avoid architecture from overfitting to data. ϵ and γ are set to be small scalars as done in (Liu, Simonyan, and Yang 2018). we follow (Simon et al. 2013) to adopt a similar pathwise solution for an incremental increase of regularization factor λ, and we experimentally show the effectiveness of this progressive sparsifying solution. The horizontal axis indicates the different sparsity constraint factor λ. As a trade-off between architecture sparsity level and stability of searching process, we typically choose a value like 0.01 as the step size in our experiments.