Slimmable Neural Networks
Authors: Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, Thomas Huang
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first conduct comprehensive experiments on Image Net classification task to show the effectiveness of switchable batch normalization for training slimmable neural networks. Compared with individually trained networks, we demonstrate similar (and in many cases better) performances of slimmable Mobile Net v1 [0.25, 0.5, 0.75, 1.0] , Mobile Net v2 [0.35, 0.5, 0.75, 1.0] , Shuffle Net [0.5, 1.0, 2.0] and Res Net-50 [0.25, 0.5, 0.75, 1.0] ([ ] denotes available switches). |
| Researcher Affiliation | Collaboration | Jiahui Yu1 Linjie Yang2 Ning Xu2 Jianchao Yang3 Thomas Huang1 1University of Illinois at Urbana-Champaign 2Snap Inc. 3Byte Dance Inc. |
| Pseudocode | Yes | Algorithm 1 illustrates a memory-efficient implementation of the training framework, which is straightforward to integrate into current neural network libraries. |
| Open Source Code | Yes | Code and models are available at: https://github.com/Jiahui Yu/slimmable_networks. |
| Open Datasets | Yes | We experiment with the Image Net (Deng et al., 2009) classification dataset with 1000 classes. It is comprised of around 1.28M training images and 50K validation images. We train all models on COCO 2017 train set and report Average Precision (AP) on COCO 2017 validation set in Table 5. |
| Dataset Splits | Yes | We experiment with the Image Net (Deng et al., 2009) classification dataset with 1000 classes. It is comprised of around 1.28M training images and 50K validation images. We train all models on COCO 2017 train set and report Average Precision (AP) on COCO 2017 validation set in Table 5. |
| Hardware Specification | Yes | All models are trained on 4 Tesla P100 GPUs and the batch mean and variance of batch normalization are computed within each GPU. |
| Software Dependencies | No | The information is insufficient. The paper mentions using 'stochastic gradient descent (SGD) as optimizer', 'MMDetection (Chen et al., 2018)' and 'Detectron (Girshick et al., 2018)' frameworks, and implies a 'pytorch-style' model. However, specific version numbers for these software dependencies (e.g., PyTorch version, MMDetection version) are not provided. |
| Experiment Setup | Yes | For Mobile Net v1 and Mobile Net v2, we train 480 epochs with mini-batch size 160, and exponentially (γ = 0.98) decrease learning rate starting from 0.045 per epoch. For Shuffle Net (g = 3), we train 250 epochs with mini-batch size 512, and linearly decrease learning rate from 0.25 to 0 per iteration. For Res Net-50, we train 100 epochs with minibatch size 256, and decrease the learning rate by 10 at 30, 60 and 90 epochs. We use stochastic gradient descent (SGD) as optimizer, Nesterov momentum with a momentum weight of 0.9 without dampening, and a weight decay of 10 4 for all training settings. |