reproducibilityindex.ai

Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space

Authors: Shangchen Du, Shan You, Xiaojie Li, Jianlong Wu, Fei Wang, Chen Qian, Changshui Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct extensive experiments to demonstrate the effectiveness of our method. We compare our methods on the logits-based and feature-based setting with other commonly used ensemble learning methods. CIFAR10 [12], CIFAR100 [11] and Image Net [3] are used to evaluate the performance.
Researcher Affiliation	Collaboration	Shangchen Du1 School of EECS, Peking University 2Sense Time 3Department of Automation, Tsinghua University 4School of Computer Science and Technology, Shandong University 5Zhejiang Laboratory 6Institute for Artiﬁcial Intelligence, Tsinghua University (THUAI) 7Beijing National Research Center for Information Science and Technology (BNRist) dushangchen@pku.edu.cn, {youshan,lixiaojie,wangfei,qianchen}@sensetime.com, jlwu1992@sdu.edu.cn, zcs@mail.tsinghua.edu.cn
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	3Code is released on https://github.com/An Tuo1998/AE-KD.
Open Datasets	Yes	CIFAR10 [12], CIFAR100 [11] and Image Net [3] are used to evaluate the performance.
Dataset Splits	Yes	CIFAR10 [12] consists of 50K training images and 10K test images from 10 classes while CIFAR100 [11] has the same amount of images but from 100 classes. We use resnet56 [8] as the teacher network and train 25 teacher models on both datasets for 240 epochs with the learning rate starting from 0.05 and multiplied by 0.1 at 150, 180, 210 epochs. For resnet20, we train for 350 epochs with the learning rate starting from 0.05 and divide it by 10 every 50 epochs since the 150th epoch. Image Net [3] contains 1.2M images from 1K classes for training and 50K for validation.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	The paper mentions 'LIBSVM [1]' but does not provide specific version numbers for any software dependencies required to replicate the experiments.
Experiment Setup	Yes	We use resnet56 [8] as the teacher network and train 25 teacher models on both datasets for 240 epochs with the learning rate starting from 0.05 and multiplied by 0.1 at 150, 180, 210 epochs. For Mobile Net V2, we use the same training strategy as teachers except that the initial learning rate is 0.01. For resnet20, we train for 350 epochs with the learning rate starting from 0.05 and divide it by 10 every 50 epochs since the 150th epoch. λ in Eq.(7) is set to 0.9 while β is determined via cross-validation from {10 1, 1, 10, 100, 1000}. The temperature in Eq. (1) is set to 4.