reproducibilityindex.ai

Bayesian Knowledge Distillation: A Bayesian Perspective of Distillation with Uncertainty Quantification

Authors: Luyang Fang, Yongkai Chen, Wenxuan Zhong, Ping Ma

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed BKD on both synthetic and real benchmark datasets. We also evaluate BKD on some synthetic datasets, presented in Appendix C. The empirical performance of BKD is demonstrated on both synthetic and real datasets.
Researcher Affiliation	Academia	1Department of Statistics, University of Georgia, Athens, USA. Correspondence to: Wenxuan Zhong <wenxuan@uga.edu>, Ping Ma <pingma@uga.edu>.
Pseudocode	Yes	Algorithm 1 Bayesian Knowledge Distillation (BKD). Input: D = {(xi, yi)}N i=1, h( , ), τ, λ, r. 1: Get the output p of the teacher model for each data point in D. 2: Calculate the posterior distribution of q = h(x, θ). 3: Generate Monte Carlo sample of θ: At iteration jth with a subset of m data points D(j) = {(x(j) i , y(j) i )}m i=1, Generate ξ(j) N(0, I), Generate θ(j) using SGLD as in Equation (9). Output: Monte Carlo sample {θ(j)}r j=1 of θ.
Open Source Code	No	The paper does not provide an explicit statement or link for the open-source code of the described methodology.
Open Datasets	Yes	We test the proposed BKD method on four benchmark datasets, (1) MNIST, (2) Fashion MNIST, (3) CIFAR-10, and (4) CIFAR-100. Detailed information about the datasets can be found in Appendix D.1. (MNIST (Le Cun, 1998) is a dataset of handwritten digit images with a training set of 60, 000 examples and a test set of 10, 000 examples.)
Dataset Splits	Yes	We consider four different scenarios for generating synthetic data, dividing the data into training, validation, and testing sets in a 7:3:1 ratio for all scenarios.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run its experiments.
Software Dependencies	No	The paper mentions the use of 'PyTorch torchvision library' but does not provide specific version numbers for this or any other software dependencies.
Experiment Setup	Yes	Algorithm 1 Bayesian Knowledge Distillation (BKD). Input: D = {(xi, yi)}N i=1, h( , ), τ, λ, r. (Appendix D.2, MNIST dataset): Specifically, the teacher model employs an MLP with two hidden layers of 1200 hidden nodes. The model uses the ReLU activation function and incorporates a dropout rate of 0.5. The model also incorporates a dropout layer with rate 0.2 for the input. The student model employs an MLP architecture consisting of two hidden layers. These layers have 200 and 100 nodes, respectively. The model uses the ReLU activation function.