Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space
Authors: Shangchen Du, Shan You, Xiaojie Li, Jianlong Wu, Fei Wang, Chen Qian, Changshui Zhang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct extensive experiments to demonstrate the effectiveness of our method. We compare our methods on the logits-based and feature-based setting with other commonly used ensemble learning methods. CIFAR10 [12], CIFAR100 [11] and Image Net [3] are used to evaluate the performance. |
| Researcher Affiliation | Collaboration | Shangchen Du1 School of EECS, Peking University 2Sense Time 3Department of Automation, Tsinghua University 4School of Computer Science and Technology, Shandong University 5Zhejiang Laboratory 6Institute for Artificial Intelligence, Tsinghua University (THUAI) 7Beijing National Research Center for Information Science and Technology (BNRist) dushangchen@pku.edu.cn, {youshan,lixiaojie,wangfei,qianchen}@sensetime.com, jlwu1992@sdu.edu.cn, zcs@mail.tsinghua.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 3Code is released on https://github.com/An Tuo1998/AE-KD. |
| Open Datasets | Yes | CIFAR10 [12], CIFAR100 [11] and Image Net [3] are used to evaluate the performance. |
| Dataset Splits | Yes | CIFAR10 [12] consists of 50K training images and 10K test images from 10 classes while CIFAR100 [11] has the same amount of images but from 100 classes. We use resnet56 [8] as the teacher network and train 25 teacher models on both datasets for 240 epochs with the learning rate starting from 0.05 and multiplied by 0.1 at 150, 180, 210 epochs. For resnet20, we train for 350 epochs with the learning rate starting from 0.05 and divide it by 10 every 50 epochs since the 150th epoch. Image Net [3] contains 1.2M images from 1K classes for training and 50K for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper mentions 'LIBSVM [1]' but does not provide specific version numbers for any software dependencies required to replicate the experiments. |
| Experiment Setup | Yes | We use resnet56 [8] as the teacher network and train 25 teacher models on both datasets for 240 epochs with the learning rate starting from 0.05 and multiplied by 0.1 at 150, 180, 210 epochs. For Mobile Net V2, we use the same training strategy as teachers except that the initial learning rate is 0.01. For resnet20, we train for 350 epochs with the learning rate starting from 0.05 and divide it by 10 every 50 epochs since the 150th epoch. λ in Eq.(7) is set to 0.9 while β is determined via cross-validation from {10 1, 1, 10, 100, 1000}. The temperature in Eq. (1) is set to 4. |