Knowledge Distillation Performs Partial Variance Reduction
Authors: Mher Safaryan, Alexandra Peste, Dan Alistarh
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our analysis puts further emphasis on the need for careful parametrization of KD, in particular w.r.t. the weighting of the distillation loss, and is validated empirically on both linear models and deep neural networks. |
| Researcher Affiliation | Academia | Mher Safaryan IST Austria mher.safaryan@ista.ac.at Alexandra Peste IST Austria alexandra.peste@ista.ac.at Dan Alistarh IST Austria dan.alistarh@ista.ac.at |
| Pseudocode | Yes | Algorithm 1 Knowledge Distillation via SGD |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-source code for the methodology described. |
| Open Datasets | Yes | Specifically, we consider classification problems using linear models in two different setups: training a linear model on the MNIST dataset [24] and linear probing on the CIFAR-10 dataset [23], using a Res Net50 model [12], pre-trained on the Image Net dataset [42]. |
| Dataset Splits | No | The paper mentions training on MNIST and CIFAR-10 datasets and evaluating performance, but it does not explicitly provide specific percentages, sample counts, or methodologies for how data was split into training, validation, and test sets. It mentions |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using SGD, but does not specify any software libraries, frameworks, or their version numbers (e.g., PyTorch, TensorFlow, Python version) that would be needed to replicate the experiments. |
| Experiment Setup | Yes | In both cases we train using SGD without momentum and regularization, with a fixed learning rate and mini-batch of size 10, for a total of 100 epochs. |