reproducibilityindex.ai

Quantized Feature Distillation for Network Quantization

Authors: Ke Zhu, Yin-Yin He, Jianxin Wu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Quantitative results show that QFD is more ﬂexible and effective (i.e., quantization friendly) than previous quantization methods. QFD surpasses existing methods by a noticeable margin on not only image classiﬁcation but also object detection, albeit being much simpler. Furthermore, QFD quantizes Vi T and Swin-Transformer on MS-COCO detection and segmentation, which veriﬁes its potential in real world deployment. To the best of our knowledge, this is the ﬁrst time that vision transformers have been quantized in object detection and image segmentation tasks.
Researcher Affiliation	Academia	Ke Zhu, Yin-Yin He, Jianxin Wu* State Key Laboratory for Novel Software Technology, Nanjing University, China zhuk@lamda.nju.edu.cn, heyy@lamda.nju.edu.cn, wujx2001@nju.edu.cn
Pseudocode	No	No structured pseudocode or algorithm blocks were found. The method is described using equations and explanatory text.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. It mentions 'Detectron2. https://github.com/facebookresearch/detectron2' but this is a third-party tool used, not the authors' own implementation code.
Open Datasets	Yes	Following previous QAT works (Zhuang et al. 2020; Lee, Kim, and Ham 2021), we conduct our experiments on the CIFAR, Image Net, CUB and MS-COCO datasets.
Dataset Splits	Yes	On both CIFAR datasets (Krizhevsky 2009), we use SGD with learning rate of 0.004, weight decay of 0.0005 and train 200 epochs in total. The input resolution is 32 × 32, and random ﬂip and random crop are used as data augmentation. On Image Net (Russakovsky et al. 2015), we train Res Net18, Res Net-34 and Mobile Net-v2 for 100 epochs.
Hardware Specification	Yes	All experiments use Py Torch (Paszke et al. 2019) with 8 Ge Force RTX 3090.
Software Dependencies	No	The paper mentions 'Py Torch' and 'Detectron2', but does not provide specific version numbers for these software components. For example, it only says 'All experiments use Py Torch (Paszke et al. 2019)' where (Paszke et al. 2019) is a citation, not a version number.
Experiment Setup	Yes	On both CIFAR datasets (Krizhevsky 2009), we use SGD with learning rate of 0.004, weight decay of 0.0005 and train 200 epochs in total. The input resolution is 32 × 32, and random ﬂip and random crop are used as data augmentation. On Image Net (Russakovsky et al. 2015), we train Res Net18, Res Net-34 and Mobile Net-v2 for 100 epochs. The initial learning rate and the momentum is 0.01 and 0.9, respectively. The weight decay is set to 1e-4, 5e-5 and 2.5e-5 for 4bit, 3-bit and 2-bit, respectively, following Han et al. (2021); Esser et al. (2020). We adopt random resized crop and random ﬂip as data augmentation and set input resolution as 224 × 224.