Up to 100x Faster Data-Free Knowledge Distillation
Authors: Gongfan Fang, Kanya Mo, Xinchao Wang, Jie Song, Shitao Bei, Haofei Zhang, Mingli Song6597-6604
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments over CIFAR, NYUv2, and Image Net demonstrate that the proposed Fast DFKD achieves 10 and even 100 acceleration while preserving performances on par with state of the art. |
| Researcher Affiliation | Collaboration | Gongfan Fang1,3*, Kanya Mo1 , Xinchao Wang2, Jie Song1 Shitao Bei1, Haofei Zhang1, Mingli Song1,3 1Zhejiang University 2National University of Singapore 3Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies |
| Pseudocode | Yes | Algorithm 1 Fast DFKD |
| Open Source Code | Yes | Code is available at https://github.com/zju-vipa/Fast-Datafree. |
| Open Datasets | Yes | We evaluate the proposed method on both classification and semantic segmentation tasks. For image classification, we conduct data-free knowledge distillation on three widely used datasets: CIFAR-10, CIFAR100 (Krizhevsky, Hinton et al. 2009) and Image Net (Deng et al. 2009). ... For semantic segmentation, we use Deeplab models (Chen et al. 2017) trained on NYUv2 (Nathan Silberman and Fergus 2012) dataset for training and evaluation... |
| Dataset Splits | No | No specific training/validation/test dataset splits (e.g., percentages, absolute counts, or explicit mention of validation set split) are detailed in the paper text. |
| Hardware Specification | No | For fair comparisons, all GPU hours are estimated on a single GPU. This is too general and does not specify a model or other hardware details. |
| Software Dependencies | No | No specific software dependencies with version numbers were mentioned. |
| Experiment Setup | Yes | We use the pretrained models from (Fang et al. 2021b) and follow the same training protocol for comparison, where 50,000 synthetic images are synthesized for distillation." and "For example, Deep Inv2k synthesizes images by optimizing mini-batches, each of which requires 2,000 iterations to converge (Yin et al. 2019). To obtain 50,000 training samples for CIFAR, Deep Inv2k would take 42.1 hours for data synthesis on a single GPU. by contrast, our method, i.e., Fast-5, adopts the same inversion loss as Deep Inv but only requires 5 steps for each batch owning to the proposed common feature reusing, which is much more efficient than Deep Inversion. |