reproducibilityindex.ai

Slimmable Neural Networks

Authors: Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, Thomas Huang

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We ﬁrst conduct comprehensive experiments on Image Net classiﬁcation task to show the effectiveness of switchable batch normalization for training slimmable neural networks. Compared with individually trained networks, we demonstrate similar (and in many cases better) performances of slimmable Mobile Net v1 [0.25, 0.5, 0.75, 1.0] , Mobile Net v2 [0.35, 0.5, 0.75, 1.0] , Shufﬂe Net [0.5, 1.0, 2.0] and Res Net-50 [0.25, 0.5, 0.75, 1.0] ([ ] denotes available switches).
Researcher Affiliation	Collaboration	Jiahui Yu1 Linjie Yang2 Ning Xu2 Jianchao Yang3 Thomas Huang1 1University of Illinois at Urbana-Champaign 2Snap Inc. 3Byte Dance Inc.
Pseudocode	Yes	Algorithm 1 illustrates a memory-efﬁcient implementation of the training framework, which is straightforward to integrate into current neural network libraries.
Open Source Code	Yes	Code and models are available at: https://github.com/Jiahui Yu/slimmable_networks.
Open Datasets	Yes	We experiment with the Image Net (Deng et al., 2009) classiﬁcation dataset with 1000 classes. It is comprised of around 1.28M training images and 50K validation images. We train all models on COCO 2017 train set and report Average Precision (AP) on COCO 2017 validation set in Table 5.
Dataset Splits	Yes	We experiment with the Image Net (Deng et al., 2009) classiﬁcation dataset with 1000 classes. It is comprised of around 1.28M training images and 50K validation images. We train all models on COCO 2017 train set and report Average Precision (AP) on COCO 2017 validation set in Table 5.
Hardware Specification	Yes	All models are trained on 4 Tesla P100 GPUs and the batch mean and variance of batch normalization are computed within each GPU.
Software Dependencies	No	The information is insufficient. The paper mentions using 'stochastic gradient descent (SGD) as optimizer', 'MMDetection (Chen et al., 2018)' and 'Detectron (Girshick et al., 2018)' frameworks, and implies a 'pytorch-style' model. However, specific version numbers for these software dependencies (e.g., PyTorch version, MMDetection version) are not provided.
Experiment Setup	Yes	For Mobile Net v1 and Mobile Net v2, we train 480 epochs with mini-batch size 160, and exponentially (γ = 0.98) decrease learning rate starting from 0.045 per epoch. For Shufﬂe Net (g = 3), we train 250 epochs with mini-batch size 512, and linearly decrease learning rate from 0.25 to 0 per iteration. For Res Net-50, we train 100 epochs with minibatch size 256, and decrease the learning rate by 10 at 30, 60 and 90 epochs. We use stochastic gradient descent (SGD) as optimizer, Nesterov momentum with a momentum weight of 0.9 without dampening, and a weight decay of 10 4 for all training settings.