reproducibilityindex.ai

SLaM: Student-Label Mixing for Distillation with Unlabeled Examples

Authors: Vasilis Kontonis, Fotis Iliopoulos, Khoa Trinh, Cenk Baykal, Gaurav Menghani, Erik Vee

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we present a principled method for knowledge distillation with unlabeled examples that we call Student-Label Mixing (SLa M) and we show that it consistently improves over prior approaches by evaluating it on several standard benchmarks.
Researcher Affiliation	Collaboration	Vasilis Kontonis UT Austin vasilis@cs.utexas.edu Fotis Iliopoulos Google Research fotisi@google.com Khoa Trinh Google Research khoatrinh@google.com Cenk Baykal Google Research baykalc@google.com Gaurav Menghani Google Research gmenghani@google.com Erik Vee Google Research erikvee@google.com
Pseudocode	Yes	In this section we present pseudo-code describing the distillation with unlabeled examples setting and the SLa M method, Algorithm 1.
Open Source Code	Yes	Remark B.1. We remark that in our experiments, we observed that not normalizing the mixing operation with k(x) 1 resulted in better results overall. Therefore, the mixing operation used in our experimental evaluation of SLa M is mix(f(x; w); α(x), k(x)) = α(x)f(x; w) + (1 α(x))(1 f(x; w)) top(ys(x); k(x)). For more details we refer the reader to the code provided in the supplementary material.
Open Datasets	Yes	CIFAR-{10,100} and Celeb A Here we present our results on CIFAR-{10, 100} [30] and Celeb A [22]. Image Net Here we present the results on Image Net [49]. Large Movies Reviews Dataset Here we present results on the Large Movies Reviews Dataset [39].
Dataset Splits	Yes	For each trial we randomly split dataset C into a small (e.g., 500 examples validation dataset V and an unlabeled training dataset U.
Hardware Specification	Yes	We ran our experiments on 64 Cloud TPU v4s each with two cores.
Software Dependencies	No	We implemented all algorithms in Python and used the Tensor Flow deep learning library [1]. The paper mentions TensorFlow but does not specify a version number for it or for Python.
Experiment Setup	Yes	For the experiments on CIFAR-10/100 and Celeb A we use the Adam optimizer with initial learning rate lr = 0.001. We then proceed according to the following learning rate schedule... For SLa M we always use 0.5 as the lower bound for isotonic regression (i.e., the parameter lb in Algorithm 2).