Model Rubik’s Cube: Twisting Resolution, Depth and Width for TinyNets

Authors: Kai Han, Yunhe Wang, Qiulin Zhang, Wei Zhang, Chunjing XU, Tong Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the Image Net benchmark illustrate that our Tiny Net performs much better than the smaller version of Efficient Nets using the inversed giant formula. For instance, our Tiny Net-E achieves a 59.9% Top-1 accuracy with only 24M FLOPs, which is about 1.9% higher than that of the previous best Mobile Net V3 with similar computational cost.
Researcher Affiliation Collaboration Kai Han1,2 Yunhe Wang1 Qiulin Zhang1,3 Wei Zhang1 Chunjing Xu1 Tong Zhang4 1Noah s Ark Lab, Huawei Technologies 2State Key Lab of Computer Science, ISCAS & UCAS 3BUPT 4HKUST
Pseudocode No The paper does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Code will be available at https://github.com/huawei-noah/ ghostnet/tree/master/tinynet_pytorch, and https://gitee.com/mindspore/ mindspore/tree/master/model_zoo/research/cv/tinynet.
Open Datasets Yes Image Net ILSVRC2012 dataset [4] is a large-scale image classification dataset containing 1.2 million images for training and 50,000 validation images belonging to 1,000 categories. ... Image Net-100 is the subset of Image Net-1000 that contains randomly sampled 100 classes. 500 training images are randomly sampled for each class, and the corresponding 5,000 images are used as validation set.
Dataset Splits Yes Image Net ILSVRC2012 dataset [4] is a large-scale image classification dataset containing 1.2 million images for training and 50,000 validation images belonging to 1,000 categories. ... Image Net-100 is the subset of Image Net-1000 that contains randomly sampled 100 classes. 500 training images are randomly sampled for each class, and the corresponding 5,000 images are used as validation set.
Hardware Specification Yes All the models are implemented using Py Torch [33] and trained on NVIDIA Tesla V100 GPUs.
Software Dependencies No The paper mentions 'Py Torch [33]' as the implementation framework, but it does not specify version numbers for PyTorch or any other software dependencies, which are required for reproducible description.
Experiment Setup Yes We train the models for 450 epochs using the RMSProp optimizer with momentum 0.9 and decay 0.9. The weight decay is 1e-5 and batch normalization momentum is set as 0.99. The initial learning rate is 0.048 and decays by 0.97 every 2.4 epochs. Learning rate warmup [6] is applied for the first 3 epochs. The batch size is 1024 for 8 GPUs with 128 images per chip. The dropout of 0.2 is applied on the last fully-connected layer for regularization. We also use exponential moving average (EMA) with decay 0.9999. For Res Nets, the models are trained for 90 epochs with batch size of 1024. SGD optimizer with the momentum 0.9 and weight decay 1e-4 is used to update the weights. The learning rate starts from 0.4 and decays by 0.1 every 30 epochs.