Towards Understanding Knowledge Distillation

Authors: Mary Phuong, Christoph Lampert

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To experimentally test the effect of data geometry on the effectiveness of distillation, we adopt the setting of Corollary 2. We consider a series of tasks of varying angular alignment, as measured by the degree, \kappa, of the polynomial by which p(\theta) is upper bounded. Specifically, for any \kappa, the task (P \kappa x , w\kappa ) is defined by the following sampling procedure... We use an input space dimension of d = 1000 and a transfer set size n = 20. Then, we train a linear student by distillation on each of the tasks and evaluate its transfer risk on held-out data. Figure 3 shows the results.
Researcher Affiliation Academia 1IST Austria (Institute of Science and Technology Austria).
Pseudocode No No pseudocode or algorithm blocks are present in the paper.
Open Source Code No The paper does not mention providing open-source code for the described methodology.
Open Datasets Yes We train the learners w\delta for \delta {0, 10, . . . , 90} on the digits 0 and 1 of the MNIST dataset
Dataset Splits No We set the transfer set size to n = 100 and evaluate the risk on the test set.
Hardware Specification No The paper does not provide specific details regarding the hardware used for running experiments.
Software Dependencies No The paper does not provide specific software dependencies or version numbers needed to replicate the experiment.
Experiment Setup Yes We use an input space dimension of d = 1000 and a transfer set size n = 20... We set the transfer set size to n = 100... We train the learners on the polynomial-angle task (P \kappa x , w\kappa ) from Section 5.1, with \kappa = 1, d = 100 and n = 5.