Towards Understanding Knowledge Distillation
Authors: Mary Phuong, Christoph Lampert
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To experimentally test the effect of data geometry on the effectiveness of distillation, we adopt the setting of Corollary 2. We consider a series of tasks of varying angular alignment, as measured by the degree, \kappa, of the polynomial by which p(\theta) is upper bounded. Specifically, for any \kappa, the task (P \kappa x , w\kappa ) is defined by the following sampling procedure... We use an input space dimension of d = 1000 and a transfer set size n = 20. Then, we train a linear student by distillation on each of the tasks and evaluate its transfer risk on held-out data. Figure 3 shows the results. |
| Researcher Affiliation | Academia | 1IST Austria (Institute of Science and Technology Austria). |
| Pseudocode | No | No pseudocode or algorithm blocks are present in the paper. |
| Open Source Code | No | The paper does not mention providing open-source code for the described methodology. |
| Open Datasets | Yes | We train the learners w\delta for \delta {0, 10, . . . , 90} on the digits 0 and 1 of the MNIST dataset |
| Dataset Splits | No | We set the transfer set size to n = 100 and evaluate the risk on the test set. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers needed to replicate the experiment. |
| Experiment Setup | Yes | We use an input space dimension of d = 1000 and a transfer set size n = 20... We set the transfer set size to n = 100... We train the learners on the polynomial-angle task (P \kappa x , w\kappa ) from Section 5.1, with \kappa = 1, d = 100 and n = 5. |