Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts

Authors: Guilin Li, Junlei Zhang, Yunhe Wang, Chuanjian Liu, Matthias Tan, Yunfeng Lin, Wei Zhang, Jiashi Feng, Tong Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on Image Net/CIFAR10/CIFAR100 demonstrate that the plain CNN network without shortcuts generated by our approach can achieve the same level of accuracy as that of the Res Net baseline while achieving about 1.4 speed-up and 1.25 memory reduction. We also verified the feature transferability of our Image Net pretrained plain-CNN network by fine-tuning it on MIT 67 and Caltech 101. Our results show that the performance of the plain-CNN is slightly higher than that of its baseline Res Net-50 on these two datasets.
Researcher Affiliation Collaboration Guilin Li1 , Junlei Zhang1 , Yunhe Wang 1, Chuanjian Liu1, Matthias Tan2, Yunfeng Lin1, Wei Zhang1, Jiashi Feng3, Tong Zhang4 1Noah s Ark Lab, Huawei Technologies 2City U, 3NUS, 4HKUST
Pseudocode Yes Algorithm 1 Joint RD Algorithm
Open Source Code Yes The code will be available at https://github.com/leoozy/Joint RD_Neurips2020 and the Mind Spore code will be available at https://www.mindspore.cn/resources/hub.
Open Datasets Yes First, we verify the effectiveness of our algorithm through experiments with classification datasets: CIFAR-10 [39], CIFAR-100 [40] and Image Net [41]. To evaluate the transferability of learned features of plain-CNN networks, we further finetune the Imagenet pre-trained plain-CNN models on two downstream task datasets: MIT67 [26] and Caltech101 [27].
Dataset Splits Yes The Image Net dataset consists of 1.2M training images and 50K validation images with 1000 classes.
Hardware Specification No The paper mentions testing on a "mobile NPU" but does not provide specific hardware details like the model number of the NPU, CPU, GPU, or memory specifications. The text is: "To verify the potential latency/memory improvement of removing shortcuts, we tested the max memory consumption and latency of plain-CNN 50 and Res Net50 on a mobile NPU."
Software Dependencies No The paper mentions "MindSpore code will be available at https://www.mindspore.cn/resources/hub" but does not specify its version or any other software dependencies with their respective version numbers.
Experiment Setup Yes We train all models 200 epochs, with a learning rate of 0.1, multiplied by 0.1 at epoch 100 and 150. For all models, we set λ = 0.001 and η decreasing with a cosine annealing policy from 1.0 to 0.5 in 60 epochs in the Equation 3. Following [42], we train the whole networks for 120 epochs, with an initial learning rate 0.2, multiplied by 0.1 at epoch 30, 60, 90. The batch size is 512. We set the weight decay as 1e 4 for resnet34 and Plain-CNN 34. While for the plain-CNN 50, due the removal of shortcuts, the previous weight decay for resnet is too high for plain-CNN 50, therefore we set it as 1e 5. We set the λ = 0.0001 and η decreasing with a cosine annealing policy from 1.0 to 0.5 in 60 epochs in Equation 3.