A statistical perspective on distillation
Authors: Aditya K Menon, Ankit Singh Rawat, Sashank Reddi, Seungyeon Kim, Sanjiv Kumar
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our findings are verified for linear models, neural networks, and decision trees, on both controlled synthetic and real-world datasets. |
| Researcher Affiliation | Industry | 1Google Research, New York. Correspondence to: Aditya Krishna Menon <adityakmenon@google.com>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | On CIFAR-100, we train teachers that are Res Nets of varying depths, and distill these to a student Res Net of fixed depth 8. ... multiclass retrieval, AMAZONCAT-13K and AMAZONCAT-670K (Mc Auley & Leskovec, 2013; Bhatia et al., 2015). |
| Dataset Splits | No | The paper mentions training and test sets but does not explicitly provide details about a validation set or its split. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions). |
| Experiment Setup | Yes | On CIFAR-100, we train teachers that are Res Nets of varying depths, and distill these to a student Res Net of fixed depth 8. ... We use a feedforward teacher model with a single (linear) hidden layer of width 512, trained to minimise the softmax cross-entropy. For the student , we make the hidden layer width 8 for AMAZONCAT-13K and 64 for AMAZONCAT-670K. ... apply temperature scaling with T 2 {1.0, 1.5, 2.0, . . . , 5.0}. |