Up to 100x Faster Data-Free Knowledge Distillation

Authors: Gongfan Fang, Kanya Mo, Xinchao Wang, Jie Song, Shitao Bei, Haofei Zhang, Mingli Song6597-6604

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments over CIFAR, NYUv2, and Image Net demonstrate that the proposed Fast DFKD achieves 10 and even 100 acceleration while preserving performances on par with state of the art.
Researcher Affiliation Collaboration Gongfan Fang1,3*, Kanya Mo1 , Xinchao Wang2, Jie Song1 Shitao Bei1, Haofei Zhang1, Mingli Song1,3 1Zhejiang University 2National University of Singapore 3Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies
Pseudocode Yes Algorithm 1 Fast DFKD
Open Source Code Yes Code is available at https://github.com/zju-vipa/Fast-Datafree.
Open Datasets Yes We evaluate the proposed method on both classification and semantic segmentation tasks. For image classification, we conduct data-free knowledge distillation on three widely used datasets: CIFAR-10, CIFAR100 (Krizhevsky, Hinton et al. 2009) and Image Net (Deng et al. 2009). ... For semantic segmentation, we use Deeplab models (Chen et al. 2017) trained on NYUv2 (Nathan Silberman and Fergus 2012) dataset for training and evaluation...
Dataset Splits No No specific training/validation/test dataset splits (e.g., percentages, absolute counts, or explicit mention of validation set split) are detailed in the paper text.
Hardware Specification No For fair comparisons, all GPU hours are estimated on a single GPU. This is too general and does not specify a model or other hardware details.
Software Dependencies No No specific software dependencies with version numbers were mentioned.
Experiment Setup Yes We use the pretrained models from (Fang et al. 2021b) and follow the same training protocol for comparison, where 50,000 synthetic images are synthesized for distillation." and "For example, Deep Inv2k synthesizes images by optimizing mini-batches, each of which requires 2,000 iterations to converge (Yin et al. 2019). To obtain 50,000 training samples for CIFAR, Deep Inv2k would take 42.1 hours for data synthesis on a single GPU. by contrast, our method, i.e., Fast-5, adopts the same inversion loss as Deep Inv but only requires 5 steps for each batch owning to the proposed common feature reusing, which is much more efficient than Deep Inversion.