KDGAN: Knowledge Distillation with Generative Adversarial Networks

Authors: Xiaojie Wang, Rui Zhang, Yu Sun, Jianzhong Qi

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments using real datasets confirm the superiority of KDGAN in both accuracy and training speed.
Researcher Affiliation Collaboration Xiaojie Wang University of Melbourne xiaojiew94@gmail.com Rui Zhang University of Melbourne rui.zhang@unimelb.edu.au Yu Sun Twitter Inc. ysun@twitter.com Jianzhong Qi University of Melbourne jianzhong.qi@unimelb.edu.au
Pseudocode Yes Algorithm 1: Minibatch stochastic gradient descent training of KDGAN.
Open Source Code Yes The code and the data are made available at https://github.com/xiaojiew1/KDGAN/.
Open Datasets Yes We use the widely adopted MNIST [27] and CIFAR-10 [26] datasets. We use the Yahoo Flickr Creative Commons 100 Million (YFCC100M) dataset [45] in the experiments.
Dataset Splits No The paper mentions 'based on validation performance' but does not provide specific numerical splits or sizes for a validation dataset, only for training and testing.
Hardware Specification No The paper mentions general hardware contexts like 'powerful server' vs. 'mobile phone' for the problem description, but does not specify the actual CPU, GPU, or other hardware used for running the experiments.
Software Dependencies No The paper mentions 'TensorFlow [1]' but does not provide a specific version number. Other tools like VGGNet, LSTM, word embeddings are mentioned without version details.
Experiment Setup Yes We use two formulations of the distillation losses including the L2 loss [7] and the Kullback Leibler divergence [23]. We search for the optimal values for the hyperparameters α in [0.0, 1.0], β in [0.001, 1000], and γ in [0.0001, 100] based on validation performance. We find that a reasonable annealing schedule for the temperature parameter τ is to start with a large value (1.0) and exponentially decay it to a small value (0.1).