Neural Architecture Search as Sparse Supernet
Authors: Yan Wu, Aoming Liu, Zhiwu Huang, Siwei Zhang, Luc Van Gool10379-10387
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on Convolutional Neural Network and Recurrent Neural Network search demonstrate that the proposed method is capable of searching for compact, general and powerful neural architectures. |
| Researcher Affiliation | Academia | 1Computer Vision Lab, ETH Z urich, Switzerland 2VISICS, KU Leuven, Belgium |
| Pseudocode | Yes | Algorithm 1: Bi-level Optimization with the Proposed Hierarchical Accelerated Proximal Gradient (HAPG) Algorithm |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the methodology described in the paper. The provided link is for the supplementary material PDF, not code. |
| Open Datasets | Yes | We evaluate the proposed Sparse NAS for Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) architecture search on CIFAR-10 and Penn Treebank (PTB) respectively, and further investigate the transferability of searched architectures on CIFAR-10 to CIFAR-100 and Image Net. The references section contains citations for these datasets (e.g., [Merity, Keskar, and Socher 2018] for PTB, [Howard et al. 2017] for MobileNet). |
| Dataset Splits | Yes | where the network weights w and the architecture weights A are optimized on two separate training and validation sets to avoid architecture from overfitting to data. In both CNN and RNN cell search experiments, we follow the setup of DARTS (Liu, Simonyan, and Yang 2018) to implement Sparse NAS, where we use the same search space, cell setup and we stack the same number of cells for fair comparison. |
| Hardware Specification | No | The paper mentions 'GPU days' for search cost and 'an Amazon AWS grant, and an Nvidia GPU grant', implying the use of GPUs and cloud resources. However, it does not provide specific hardware details such as exact GPU/CPU models, memory amounts, or detailed computer specifications. |
| Software Dependencies | No | The paper references algorithms like Adam and frameworks like DARTS but does not provide specific software dependency versions (e.g., Python 3.x, PyTorch 1.x, CUDA 10.x). |
| Experiment Setup | Yes | where the network weights w and the architecture weights A are optimized on two separate training and validation sets to avoid architecture from overfitting to data. ϵ and γ are set to be small scalars as done in (Liu, Simonyan, and Yang 2018). we follow (Simon et al. 2013) to adopt a similar pathwise solution for an incremental increase of regularization factor λ, and we experimentally show the effectiveness of this progressive sparsifying solution. The horizontal axis indicates the different sparsity constraint factor λ. As a trade-off between architecture sparsity level and stability of searching process, we typically choose a value like 0.01 as the step size in our experiments. |