reproducibilityindex.ai

Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation

Authors: Zhiwei Hao, Jianyuan Guo, Ding Jia, Kai Han, Yehui Tang, Chao Zhang, Han Hu, Yunhe Wang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed method on the Image Net-1k classiﬁcation task. The proposed manifold KD outperforms the distillation method in [16] by +2.0% top-1 accuracy on Dei T-Tiny. We also conduct transfer learning experiments on CIFAR-10/100 and evaluate our method on downstream tasks such as object detection and semantic segmentation.
Researcher Affiliation	Collaboration	Zhiwei Hao1,2, Jianyuan Guo2, Ding Jia2,3, Kai Han2, Yehui Tang2,3, Chao Zhang3, Han Hu1 , Yunhe Wang2 1School of information and Electronics, Beijing Institute of Technology. 2Huawei Noah s Ark Lab. 3Key Laboratory of Machine Perception (MOE), School of Intelligence Science and Technology, Peking University.
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Pytorch code: https://github.com/Hao840/manifolddistillation and https://github.com/huawei-noah/Efﬁcient-Computing.
Open Datasets	Yes	We evaluate our ﬁne-grained manifold distillation method on Image Net-1k [39] classiﬁcation task, CIFAR-10/100 [40] transfer learning task, COCO [41] object detection task, and ADE20K [42] semantic segmentation task.
Dataset Splits	Yes	Image Net-1k... consists of more than 1.2M training images and 50K validation images from 1000 classes.
Hardware Specification	Yes	Each student is trained for 300 epochs with 8 Tesla-V100 GPUs.
Software Dependencies	No	Our implementation is based on Pytorch framework [43] and the Mind Spore 2 Lite tool [44]. (No specific version for Pytorch is mentioned, and 'Mind Spore 2 Lite' lacks a precise version format like 2.x.x).
Experiment Setup	Yes	The hyper-parameter λ in the KD loss is set to 1, i.e., the real label is not used to train the student. When the teacher is smaller than the student, to prevent the performance degradation caused by the weak teacher, we set λ to 0.5. In the ﬁne-grained manifold distillation loss, hyper-parameters α, β, and γ are set to 4, 0.1, and 0.2, respectively. The sampling number K in loss term Lrandom is set to 192. ... Each student is trained for 300 epochs with 8 Tesla-V100 GPUs.