Understanding the Gains from Repeated Self-Distillation

Authors: Divyansh Pareek, Simon S. Du, Sewoong Oh

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results on regression tasks from the UCI repository show a reduction in the learnt model s risk (MSE) by up to 47%.
Researcher Affiliation Academia Divyansh Pareek Simon S. Du Sewoong Oh Paul G. Allen School of Computer Science and Engineering University of Washington, Seattle, WA {dpareek,ssdu,sewoong}@cs.washington.edu
Pseudocode No The paper presents mathematical equations and derivations, but does not include structured pseudocode or algorithm blocks.
Open Source Code No Our main contributions are theoretical results, however we plan to release relevant code on github.
Open Datasets Yes We implement multi-step SD for real-world regression tasks from the UCI repository [18].
Dataset Splits Yes First, split the original dataset into three parts for a Train-Validation-Test split. We divide all datasets in a 30 30 40 split.
Hardware Specification Yes We note that all experiments run on a single CPU within 60 seconds (wall-clock time).
Software Dependencies No We utilize sklearn s implementation of the RIDGE. No specific version number for sklearn is provided.
Experiment Setup Yes Select a grid of λ values (and ensure that it is large enough so that the optimal λ lies in it). The grid has a factor of 10 difference between consecutive values (e.g., {1, 10, 10, , 104}).