KDGAN: Knowledge Distillation with Generative Adversarial Networks
Authors: Xiaojie Wang, Rui Zhang, Yu Sun, Jianzhong Qi
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments using real datasets confirm the superiority of KDGAN in both accuracy and training speed. |
| Researcher Affiliation | Collaboration | Xiaojie Wang University of Melbourne xiaojiew94@gmail.com Rui Zhang University of Melbourne rui.zhang@unimelb.edu.au Yu Sun Twitter Inc. ysun@twitter.com Jianzhong Qi University of Melbourne jianzhong.qi@unimelb.edu.au |
| Pseudocode | Yes | Algorithm 1: Minibatch stochastic gradient descent training of KDGAN. |
| Open Source Code | Yes | The code and the data are made available at https://github.com/xiaojiew1/KDGAN/. |
| Open Datasets | Yes | We use the widely adopted MNIST [27] and CIFAR-10 [26] datasets. We use the Yahoo Flickr Creative Commons 100 Million (YFCC100M) dataset [45] in the experiments. |
| Dataset Splits | No | The paper mentions 'based on validation performance' but does not provide specific numerical splits or sizes for a validation dataset, only for training and testing. |
| Hardware Specification | No | The paper mentions general hardware contexts like 'powerful server' vs. 'mobile phone' for the problem description, but does not specify the actual CPU, GPU, or other hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions 'TensorFlow [1]' but does not provide a specific version number. Other tools like VGGNet, LSTM, word embeddings are mentioned without version details. |
| Experiment Setup | Yes | We use two formulations of the distillation losses including the L2 loss [7] and the Kullback Leibler divergence [23]. We search for the optimal values for the hyperparameters α in [0.0, 1.0], β in [0.001, 1000], and γ in [0.0001, 100] based on validation performance. We find that a reasonable annealing schedule for the temperature parameter τ is to start with a large value (1.0) and exponentially decay it to a small value (0.1). |