AutoShrink: A Topology-Aware NAS for Discovering Efficient Neural Architecture

Authors: Tunhou Zhang, Hsin-Pai Cheng, Zhenwen Li, Feng Yan, Chengyu Huang, Hai Li, Yiran Chen6829-6836

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Auto Shrink on image classification and language tasks by crafting Shrink CNN and Shrink RNN models. Shrink CNN is able to achieve up to 48% parameter reduction and save 34% Multiply-Accumulates (MACs) on Image Net-1K with comparable accuracy of state-of-the-art (SOTA) models.
Researcher Affiliation Academia Tunhou Zhang,1 Hsin-Pai Cheng,1 Zhenwen Li,2 Feng Yan,3 Chengyu Huang,4 Hai Li,1 Yiran Chen1 1ECE Department, Duke University, Durham, NC 27708 2Institute of Computational Linguistics, Peking University, Beijing, China 3CSE Department, University of Nevada, Reno, NV 89557 4Department of Electronic Engineering, Tsinghua University, Beijing
Pseudocode Yes Algorithm 1 Auto Shrink
Open Source Code No No statement or link providing concrete access to the source code for the methodology described in this paper was found.
Open Datasets Yes We evaluate Auto Shrink for CNN and RNN architecture search on image classification and language tasks, respectively. Shrink CNN that is crafted over the Image Net-1K dataset (Deng et al. 2009) has the similar accuracy performance as state-of-the-art (SOTA) techniques. We construct the proxy dataset for image classification tasks by randomly selecting 5,000 examples from the CIFAR-10 dataset (Krizhevsky and others 2009) with an equal distribution of classes. We adapt our representative RNN cell structure to the full Penn-Treebank dataset (Marcus et al. 1994) for crafting Shrink RNN.
Dataset Splits No The paper mentions using 'validation accuracy' on a 'proxy dataset' and 'validation and test results on the Penn-Treebank dataset', and also references '50,000 images of Image Net-1K validation dataset' in Table 2 caption. However, it does not provide specific split percentages, sample counts, or explicit references to standard train/validation/test splits for the datasets used in their experiments.
Hardware Specification No The paper mentions 'GPU hours' and 'GPU seconds' for training and search time. It also states that 'resources provided by Amazon Web Services as part of the NSF BIGDATA program' were used. However, it does not specify any exact GPU/CPU models, processor types, or detailed computer specifications for the experimental setup.
Software Dependencies No The paper mentions software components like 'RMSprop optimizer' and 'Re LU activation', but does not provide specific names of programming languages, libraries, or solvers with version numbers (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup Yes We adopt similar pre-processing pipeline as Inception V3 (Szegedy et al. 2016) and use RMSprop optimizer (Hinton, Srivastava, and Swersky 2012) with an initial learning rate 0.1 to optimize the CNN architectures. The cosine learning decay suggested in SGDR (Loshchilov and Hutter 2016) is employed to reduce the generalization error. NT-ASGD algorithm (Merity, Keskar, and Socher 2018) is used to train the Shrink RNN architecture and the initial learning rate is set to 20. Additional regularization techniques include an ℓ2 regularization weighted by 8 10 7; variational dropout (Gal and Ghahramani 2016) of 0.2 to word embeddings, 0.75 to cell input, 0.25 to hidden nodes and 0.75 to output layer.