Slimmable Neural Networks

Authors: Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, Thomas Huang

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first conduct comprehensive experiments on Image Net classification task to show the effectiveness of switchable batch normalization for training slimmable neural networks. Compared with individually trained networks, we demonstrate similar (and in many cases better) performances of slimmable Mobile Net v1 [0.25, 0.5, 0.75, 1.0] , Mobile Net v2 [0.35, 0.5, 0.75, 1.0] , Shuffle Net [0.5, 1.0, 2.0] and Res Net-50 [0.25, 0.5, 0.75, 1.0] ([ ] denotes available switches).
Researcher Affiliation Collaboration Jiahui Yu1 Linjie Yang2 Ning Xu2 Jianchao Yang3 Thomas Huang1 1University of Illinois at Urbana-Champaign 2Snap Inc. 3Byte Dance Inc.
Pseudocode Yes Algorithm 1 illustrates a memory-efficient implementation of the training framework, which is straightforward to integrate into current neural network libraries.
Open Source Code Yes Code and models are available at: https://github.com/Jiahui Yu/slimmable_networks.
Open Datasets Yes We experiment with the Image Net (Deng et al., 2009) classification dataset with 1000 classes. It is comprised of around 1.28M training images and 50K validation images. We train all models on COCO 2017 train set and report Average Precision (AP) on COCO 2017 validation set in Table 5.
Dataset Splits Yes We experiment with the Image Net (Deng et al., 2009) classification dataset with 1000 classes. It is comprised of around 1.28M training images and 50K validation images. We train all models on COCO 2017 train set and report Average Precision (AP) on COCO 2017 validation set in Table 5.
Hardware Specification Yes All models are trained on 4 Tesla P100 GPUs and the batch mean and variance of batch normalization are computed within each GPU.
Software Dependencies No The information is insufficient. The paper mentions using 'stochastic gradient descent (SGD) as optimizer', 'MMDetection (Chen et al., 2018)' and 'Detectron (Girshick et al., 2018)' frameworks, and implies a 'pytorch-style' model. However, specific version numbers for these software dependencies (e.g., PyTorch version, MMDetection version) are not provided.
Experiment Setup Yes For Mobile Net v1 and Mobile Net v2, we train 480 epochs with mini-batch size 160, and exponentially (γ = 0.98) decrease learning rate starting from 0.045 per epoch. For Shuffle Net (g = 3), we train 250 epochs with mini-batch size 512, and linearly decrease learning rate from 0.25 to 0 per iteration. For Res Net-50, we train 100 epochs with minibatch size 256, and decrease the learning rate by 10 at 30, 60 and 90 epochs. We use stochastic gradient descent (SGD) as optimizer, Nesterov momentum with a momentum weight of 0.9 without dampening, and a weight decay of 10 4 for all training settings.