reproducibilityindex.ai

Undistillable: Making A Nasty Teacher That CANNOT teach students

Authors: Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on several datasets demonstrate that our method is effective on both standard KD and data-free KD, providing the desirable KD-immunity to model owners for the ﬁrst time. Our codes and pre-trained models can be found at https://github.com/VITA-Group/Nasty-Teacher.
Researcher Affiliation	Academia	1University of California, Irvine, 2University of Texas at Austin, 3Texas A&M University, 4Yale University {haoyum3,xhx}@uci.edu,{tianlong.chen, atlaswang}@utexas.edu, tkhu@tamu.edu, chenyu.you@yale.edu
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our codes and pre-trained models can be found at https://github.com/VITA-Group/Nasty-Teacher.
Open Datasets	Yes	We explore the effectiveness of our nasty teachers on three representative datasets, i.e., CIFAR-10, CIFAR-100, and Tiny-Image Net.
Dataset Splits	No	The paper mentions training epochs and learning rate schedules but does not explicitly provide information on validation dataset splits or percentages.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	The paper mentions optimizers like Adam and SGD but does not list specific software dependencies with version numbers (e.g., programming language, libraries, frameworks).
Experiment Setup	Yes	The distilling temperature τA for self-undermining training is set to 4 for CIFAR-10 and 20 for both CIFAR-100 and Tiny-Image Net as suggested in (Yuan et al., 2020). For the selection of ω, 0.004, 0.005, and 0.01 are picked for CIFAR-10, CIFAR-100, and Tiny-Image Net, respectively. For the plain CNN, we train it with a learning rate of 1e 3 for 100 epochs and optimize it by Adam optimizer (Kingma & Ba, 2014). Other networks are optimized by SGD optimizer with momentum 0.9 and weight decay 5e 4. The learning rate is initialized as 0.1. Networks are trained by 160 epochs with learning rate decayed by a factor of 10 at the 80th and 120th epoch for CIFAR-10, and 200 epochs with learning rate decayed by a factor of 5 at the 60th, 120th and 160th epoch for CIFAR-100 and Tiny-Image Net.