Momentum Adversarial Distillation: Handling Large Distribution Shifts in Data-Free Knowledge Distillation
Authors: Kien Do, Thai Hung Le, Dung Nguyen, Dang Nguyen, HARIPRIYA HARIKUMAR, Truyen Tran, Santu Rana, Svetha Venkatesh
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on six benchmark datasets including big datasets like Image Net and Places365 demonstrate the superior performance of MAD over competing methods for handling the large distribution shift problem. Our method also compares favorably to existing DFKD methods and even achieves state-of-the-art results in some cases. 5 Experiments |
| Researcher Affiliation | Academia | Kien Do, Hung Le, Dung Nguyen, Dang Nguyen, Haripriya Harikumar, Truyen Tran, Santu Rana, Svetha Venkatesh Applied Artificial Intelligence Institute (A2I2), Deakin University, Australia |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code for the described methodology or links to a code repository. |
| Open Datasets | Yes | We consider the image classification task and evaluate our proposed method on 3 small image datasets (CIFAR10 [24], CIFAR100 [24], Tiny Image Net [25]), and 3 large image datasets (Image Net [9], Places365 [51], Food101 [4]). |
| Dataset Splits | No | The paper lists standard datasets like CIFAR10, CIFAR100, and Image Net but does not explicitly specify the train/validation/test splits used for these datasets in the experimental setup. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., specific GPU or CPU models) used for running the experiments. |
| Software Dependencies | No | The paper mentions using PyTorch and optimizers like SGD and Adam, but it does not specify version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | If not otherwise specified, we set the momentum α in Eq. 5 to 0.95 and the length of the noise vector to 256. We train the student S using SGD and Adam for the small and large datasets, respectively. We train the generator G using Adam for both the small and large datasets. |