Robust and Resource-Efficient Data-Free Knowledge Distillation by Generative Pseudo Replay
Authors: Kuluhan Binici, Shivam Aggarwal, Nam Trung Pham, Karianto Leman, Tulika Mitra6089-6096
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on image classification benchmarks show that our method optimizes the expected value of the distilled model accuracy while eliminating the large memory overhead incurred by the sample-storing methods. Extensive experimental evaluation of our approach in comparison with the state-of-the-art. Experimental Evaluation: We demonstrate the effectiveness of PRE-DFKD in improving the expected student accuracy and reducing resource utilization on several image classification benchmarks. |
| Researcher Affiliation | Collaboration | Kuluhan Binici1, 2, Shivam Aggarwal2, Nam Trung Pham1, Karianto Leman1, Tulika Mitra2 1Institute for Infocomm Research, A*STAR, Singapore 2School of Computing, National University of Singapore |
| Pseudocode | Yes | Algorithm 1: Memory Sample Generator Training. Algorithm 2: Memory Sample Generator Inference. |
| Open Source Code | Yes | Our code and additional experimental details are available at https://github. com/kuluhan/PRE-DFKD. |
| Open Datasets | Yes | We use four datasets with different complexities and sizes. The simplest is MNIST (Le Cun et al. 1998) that contains 32 32 grayscale images from ten classes. CIFAR10 (Krizhevsky and Hinton 2009) contains RGB images from ten classes (3 32 32). CIFAR100 (Krizhevsky and Hinton 2009) contains hundred different categories while the samples have the same dimensions as those in CIFAR10. Lastly, the most complex dataset we use is Tiny Image Net (Deng et al. 2009) that contains 64 64 RGB samples from 200 classes. |
| Dataset Splits | Yes | Figure 1: Example of student accuracy degradation over distillation steps due to catastrophic forgetting. Red vertical line marks the epoch with peak accuracy on the validation dataset. The first row Train with data shows the student accuracy when trained with the original dataset. The remaining rows report the accuracy for different baseline data-free KD methods and our approach (PRE-DFKD). As expected, the Train with data student accuracy is lower than the teacher accuracy and the data-free methods have lower student accuracy than Train with data student accuracy. |
| Hardware Specification | No | The paper mentions '2.8K NVIDIA V100 GPU-hours' in the context of the Deep Inversion method (Yin et al. 2020) but does not provide specific hardware specifications (GPU models, CPU models, or memory details) used for its own experiments. |
| Software Dependencies | No | The paper states 'For a fair comparison, we used the implementations of CMI, DAFL, and DFAD available from the authors Git Hub pages.' but it does not provide specific software dependencies, such as programming languages or libraries with version numbers, for its own implementation or experimental setup. |
| Experiment Setup | Yes | Implementation Details We run each method for 200, 200, 400, and 500 epochs for MNIST, CIFAR10, CIFAR100, and Tiny Image Net, respectively. To evaluate each dataset and method pair, we conduct four runs. For MNIST, we select Le Net5 (Le Cun et al. 1998) and Le Net5-half as the teacher-student pair. For the remaining datasets, we use Res Net34 (He et al. 2016) as the teacher and Res Net18 as the student. |