Growing Efficient Deep Networks by Structured Continuous Sparsification
Authors: Xin Yuan, Pedro Henrique Pamplona Savarese, Michael Maire
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate these advantages while comparing to recent NAS and pruning methods through extensive experiments on classification, semantic segmentation, and word-level language modeling. |
| Researcher Affiliation | Academia | Xin Yuan University of Chicago yuanx@uchicago.edu Pedro Savarese TTI-Chicago savarese@ttic.edu Michael Maire University of Chicago mmaire@uchicago.edu |
| Pseudocode | Yes | Algorithm 1 : Optimization Input: Data X = (xi)n i=1, labels Y = (yi)n i=1 Output: Grown efficient model G Initialize: G, w, u, λbase 1 and λbase 2 . Set ts as all 0 vectors associating σ functions. for epoch = 1 to T do Evaluate G s sparsity u G and calculate u = u u G Update λ1 λbase 1 u; λ2 λbase 2 u in Eq. (6) using Eq. (4) for r = 1 to R do Sample mini-batch xi, yi from X, Y Train G using Eq. (6) with SGD end for Sample indicators qc,l Bern(σ(βsc,l)) and record the index idx where q value is 1. Update ts[idx] = ts[idx] + 1 Update β using Eq. (7) end for return G |
| Open Source Code | No | The paper does not provide an explicit link to open-source code for the methodology described, nor does it state that the code will be made available in supplementary materials. |
| Open Datasets | Yes | For image classification, we use CIFAR-10 (Krizhevsky et al., 2014) and Image Net (Deng et al., 2009)... For semantic segmentation, we use the PASCAL VOC 2012 (Everingham et al., 2015) benchmark... For language modeling, we use the word level Penn Treebank (PTB) dataset (Marcus et al., 1993)... |
| Dataset Splits | Yes | CIFAR-10 consists of 60,000 images of 10 classes, with 6,000 images per class. The train and test sets contain 50,000 and 10,000 images respectively. Image Net is a large dataset for visual recognition which contains over 1.2M images in the training set and 50K images in the validation set covering 1,000 categories. For semantic segmentation, we use the PASCAL VOC 2012... The original dataset contains 1,464 (train), 1,449 (val), and 1,456 (test) pixel-level labeled images for training, validation, and testing, respectively. The dataset is augmented by the extra annotations provided by (Hariharan et al., 2011), resulting in 10,582 training images. For language modeling, we use the word level Penn Treebank (PTB) dataset (Marcus et al., 1993) which consists of 929k training words, 73k validation words, and 82k test words, with 10,000 unique words in its vocabulary. |
| Hardware Specification | No | The paper mentions '4 TITAN V GPUs' when comparing against Auto Grow's training time, but does not specify the hardware used for its own experiments. |
| Software Dependencies | No | The paper makes references to GitHub repositories that imply the use of PyTorch (e.g., 'pytorch-resnet-cifar10', 'pytorch-cifar', 'DeepLabv3.pytorch'), but it does not specify any software versions for PyTorch or other dependencies. |
| Experiment Setup | Yes | For model weights, we adopt the same hyperparameters used to train the corresponding unpruned baseline models, except for setting the dropout keep probability for language modeling to 0.65. We initialize mask weights such that a single filter is activated in each layer. We train with SGD, an initial learning rate of 0.1, weight decay of 10 6 and momentum 0.9. Trade-off parameter λbase 1 is set to 0.5 on all tasks; λ2 is not used since we do not perform layer growing here. We set σ as the sigmoid function and γ as 100 1 T where T is the total number of epochs. VGG-16, Res Net-20, and Wide Res Net-28-10 are trained for 160, 160, and 200 epochs, respectively, with a batch size of 128 and initial learning rate of 0.1. For VGG-16 and Res Net-20, we divide learning rate by 10 at epochs 80 and 120, and set the weight decay and momentum as 10 4 and 0.9. For Wide Res Net-28-10, the learning rate is divided by 5 at epochs 60, 120, and 160; the weight decay and momentum are set to 5 10 4 and 0.9. |