Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation

Authors: Kenneth Borup, Lars N Andersen

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, it has been shown that this procedure often generalizes better than the model trained merely on the original targets, and achieves higher predictive performance on validation data, despite no additional information being provided during training (Furlanello et al., 2018; Ahn et al., 2019; Yang et al., 2018). Experimental results in Section B can be found at github.com/Kennethborup/self_distillation.
Researcher Affiliation Academia Kenneth Borup Department of Mathematics Aarhus University EMAIL Lars N. Andersen Department of Mathematics Aarhus University EMAIL
Pseudocode Yes Algorithm 1: Calculate ˆβ(τ) and α (τ) for τ 2. Calculate ˆβ(1) from (3) (with any α(1)); Calculate y(1) = f( X, ˆβ(1)); for t = 2 to τ do Calculate ˆβ(t) α=0 from (3) and y(t) α=0 = f( X, ˆβ(t) α=0); Solve: α (t) = argmin α R y α y(1) + (1 α) y(t) α=0 2 Calculate ˆβ(t) from (3) with α (t); end
Open Source Code Yes Experimental results in Section B can be found at github.com/Kennethborup/self_distillation.
Open Datasets Yes We perform self-distillation with Res Net-50 (He et al., 2016) networks on CIFAR-10 (Krizhevsky and Hinton, 2009)
Dataset Splits No The paper mentions using validation data for comparison, but it does not provide specific details on the dataset split percentages, sample counts, or the methodology used to create these splits.
Hardware Specification No The paper states: "We would like to thank Genome DK and Aarhus University for providing computational resources that contributed to these research results." This statement is too general and does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts).
Software Dependencies No The paper mentions "Py Torch Lightning" in a citation but does not specify a version number. No other specific software with version numbers is provided.
Experiment Setup Yes The model is initialized randomly at each step and trained according to the above with either estimated optimal parameters, ˆα(τ), or fixed α for all steps. We use the network weights from the last iteration of training at each distillation step for the next step, irrespective of whether a better model occurred earlier in the training. Our models are trained for a fixed 75 epochs and each experiment is repeated with 4 different random seeds over 11 chains of distillation steps.