Understanding the Gains from Repeated Self-Distillation
Authors: Divyansh Pareek, Simon S. Du, Sewoong Oh
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on regression tasks from the UCI repository show a reduction in the learnt model s risk (MSE) by up to 47%. |
| Researcher Affiliation | Academia | Divyansh Pareek Simon S. Du Sewoong Oh Paul G. Allen School of Computer Science and Engineering University of Washington, Seattle, WA {dpareek,ssdu,sewoong}@cs.washington.edu |
| Pseudocode | No | The paper presents mathematical equations and derivations, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | Our main contributions are theoretical results, however we plan to release relevant code on github. |
| Open Datasets | Yes | We implement multi-step SD for real-world regression tasks from the UCI repository [18]. |
| Dataset Splits | Yes | First, split the original dataset into three parts for a Train-Validation-Test split. We divide all datasets in a 30 30 40 split. |
| Hardware Specification | Yes | We note that all experiments run on a single CPU within 60 seconds (wall-clock time). |
| Software Dependencies | No | We utilize sklearn s implementation of the RIDGE. No specific version number for sklearn is provided. |
| Experiment Setup | Yes | Select a grid of λ values (and ensure that it is large enough so that the optimal λ lies in it). The grid has a factor of 10 difference between consecutive values (e.g., {1, 10, 10, , 104}). |