reproducibilityindex.ai

Momentum Adversarial Distillation: Handling Large Distribution Shifts in Data-Free Knowledge Distillation

Authors: Kien Do, Thai Hung Le, Dung Nguyen, Dang Nguyen, HARIPRIYA HARIKUMAR, Truyen Tran, Santu Rana, Svetha Venkatesh

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on six benchmark datasets including big datasets like Image Net and Places365 demonstrate the superior performance of MAD over competing methods for handling the large distribution shift problem. Our method also compares favorably to existing DFKD methods and even achieves state-of-the-art results in some cases. 5 Experiments
Researcher Affiliation	Academia	Kien Do, Hung Le, Dung Nguyen, Dang Nguyen, Haripriya Harikumar, Truyen Tran, Santu Rana, Svetha Venkatesh Applied Artiﬁcial Intelligence Institute (A2I2), Deakin University, Australia
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code for the described methodology or links to a code repository.
Open Datasets	Yes	We consider the image classiﬁcation task and evaluate our proposed method on 3 small image datasets (CIFAR10 [24], CIFAR100 [24], Tiny Image Net [25]), and 3 large image datasets (Image Net [9], Places365 [51], Food101 [4]).
Dataset Splits	No	The paper lists standard datasets like CIFAR10, CIFAR100, and Image Net but does not explicitly specify the train/validation/test splits used for these datasets in the experimental setup.
Hardware Specification	No	The paper does not specify the exact hardware (e.g., specific GPU or CPU models) used for running the experiments.
Software Dependencies	No	The paper mentions using PyTorch and optimizers like SGD and Adam, but it does not specify version numbers for these software components or any other libraries.
Experiment Setup	Yes	If not otherwise speciﬁed, we set the momentum α in Eq. 5 to 0.95 and the length of the noise vector to 256. We train the student S using SGD and Adam for the small and large datasets, respectively. We train the generator G using Adam for both the small and large datasets.