reproducibilityindex.ai

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

Authors: Zeyuan Allen-Zhu, Yuanzhi Li

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We defer discussions of our empirical results to Section 5. However, we highlight some of the empirical findings, as they shall confirm and justify our theoretical approach studying ensemble and knowledge distillation in deep learning. Specifically, we give empirical evidences showing that: Knowledge distillation does not work for random feature mappings; and ensemble in deep learning is very different from ensemble in random feature mappings (see Figure 1). WRN-28-10 on CIFAR10 WRN-28-10 on CIFAR100 Deep Learning: 97.20% 96.70 0.21% 84.69% knowledge distillation / self-distillation accuracies 97.22% / 97.13% 83.81% / 83.56% 96.46% 81.83% 81.51 0.16%
Researcher Affiliation	Collaboration	Zeyuan Allen-Zhu Meta FAIR Labs zeyuanallenzhu@meta.com Yuanzhi Li Mohamed bin Zayed University of AI Yuanzhi.Li@mbzuai.ac.ae
Pseudocode	No	The paper describes algorithms and updates using mathematical notation and prose, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement about releasing source code for the methodology or a link to a code repository. It mentions that the full version of the paper is on arXiv.
Open Datasets	Yes	WRN-28-10 on CIFAR10 WRN-28-10 on CIFAR100
Dataset Splits	No	The paper mentions training data (Z) and discusses test accuracy, but it does not specify the train/validation/test splits (e.g., percentages or sample counts) for the datasets used in its empirical results.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as CPU or GPU models, or cloud computing specifications.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific libraries with their versions).
Experiment Setup	No	The paper describes theoretical parameters like learning rate η and number of iterations T in general terms for its proofs, but it does not provide specific numerical hyperparameters (e.g., learning rate = 0.01, batch size = 64) or detailed training configurations for the empirical experiments presented (e.g., in Figure 1).