reproducibilityindex.ai

Efficient Knowledge Distillation from Model Checkpoints

Authors: Chaofei Wang, Qisen Yang, Rui Huang, Shiji Song, Gao Huang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments verify its effectiveness and applicability. (from Abstract) and Our contributions are summarized as follows: By designing two exploratory experiments, we observe the phenomenon... Experiments validate its effectiveness and adaptability. (from Introduction and Contributions section).
Researcher Affiliation	Academia	Chaofei Wang , Qisen Yang , Rui Huang, Shiji Song, Gao Huang Department of Automation, Tsinghua University, China wangcf18, yangqs19, hr20@mails.tsinghua.edu.cn shijis, gaohuang@tsinghua.edu.cn
Pseudocode	Yes	Algorithm 1 Distillation with the optimal intermediate teacher.
Open Source Code	Yes	Our code is available at https://github.com/Leap Lab THU/Checkpoint KD. (from Abstract) and Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] We provide the URL of code for reproducing the main results. (from Checklist 3.a)
Open Datasets	Yes	For generality, we conduct experiments on the CIFAR-100 [36], Tiny-Image Net[37] and Image Net [38] datasets with various teacher-student pairs. (from Section 3.2) and We only use open source datasets. (from Checklist 4.d)
Dataset Splits	Yes	For generality, we conduct experiments on the CIFAR-100 [36], Tiny-Image Net[37] and Image Net [38] datasets with various teacher-student pairs. (Section 3.2) and Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Section 3.2,Section 4.2,Section 5 and the Appendix. (Checklist 3.b). These datasets have well-known, predefined train/validation/test splits, which are implicitly used here.
Hardware Specification	Yes	All experiments are implemented by PyTorch and run on TITAN Xp GPUs. (from Appendix A.1)
Software Dependencies	No	All experiments are implemented by PyTorch and run on TITAN Xp GPUs. (from Appendix A.1). While PyTorch is mentioned, a specific version number is not provided, nor are other software dependencies with versions.
Experiment Setup	Yes	For fair comparison, we search the optimal hyperparameters (i.e., the loss ratio α and the temperature τ ) for each teacher-student pair. (Section 3.2) and We train each teacher model for 200 epochs to ensure convergence. We save the intermediate models at the 20th, 40th, ..., 180th epochs as intermediate teachers, and the models at the 200th epoch as full teachers. (Section 3.2)