reproducibilityindex.ai

Self-Distillation Amplifies Regularization in Hilbert Space

Authors: Hossein Mobahi, Mehrdad Farajtabar, Peter Bartlett

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This work provides the ﬁrst theoretical analysis of self-distillation. We focus on ﬁtting a nonlinear function to training data, where the model space is Hilbert space and ﬁtting is subject to 2 regularization in this function space. We show that selfdistillation iterations modify regularization by progressively limiting the number of basis functions that can be used to represent the solution. This implies (as we also verify empirically) that while a few rounds of self-distillation may reduce over-ﬁtting, further rounds may lead to under-ﬁtting and thus worse performance. ... In our experiments, we aim to empirically evaluate our theoretical analysis in the setting of deep networks. ... Both of these phenomena are observed in four left plots Figure 3.
Researcher Affiliation	Collaboration	Hossein Mobahi hmobahi@google.com Google Research, Mountain View, CA, USA Mehrdad Farajtabar farajtabar@google.com Deep Mind, Mountain View, CA, USA Peter L. Bartlett bartlett@eecs.berkeley.edu EECS Dept., University of California at Berkeley, Berkeley, CA, USA
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Full proofs for these as well as the code for reproducing examples in Sections 4 and results in Section 5 are available in the supplementary appendix.
Open Datasets	Yes	We use Resnet [12] and VGG [30] neural architectures and train them on CIFAR-10 and CIFAR-100 datasets [18].
Dataset Splits	No	The paper mentions CIFAR-10 and CIFAR-100 but does not specify explicit train/validation/test splits within the main text. It states 'Training details and additional results are left to the appendix.'
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU or CPU models used for experiments.
Software Dependencies	No	The paper does not specify version numbers for any software dependencies, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	No	The paper generally mentions training with 'Resnet' and 'VGG' architectures using '2 loss' and 'cross-entropy loss' and 'randomly initialized weights'. However, specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations are not provided in the main text, stating 'Training details and additional results are left to the appendix.'