Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SLaM: Student-Label Mixing for Distillation with Unlabeled Examples

Authors: Vasilis Kontonis, Fotis Iliopoulos, Khoa Trinh, Cenk Baykal, Gaurav Menghani, Erik Vee

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we present a principled method for knowledge distillation with unlabeled examples that we call Student-Label Mixing (SLa M) and we show that it consistently improves over prior approaches by evaluating it on several standard benchmarks.
Researcher Affiliation Collaboration Vasilis Kontonis UT Austin EMAIL Fotis Iliopoulos Google Research EMAIL Khoa Trinh Google Research EMAIL Cenk Baykal Google Research EMAIL Gaurav Menghani Google Research EMAIL Erik Vee Google Research EMAIL
Pseudocode Yes In this section we present pseudo-code describing the distillation with unlabeled examples setting and the SLa M method, Algorithm 1.
Open Source Code Yes Remark B.1. We remark that in our experiments, we observed that not normalizing the mixing operation with k(x) 1 resulted in better results overall. Therefore, the mixing operation used in our experimental evaluation of SLa M is mix(f(x; w); α(x), k(x)) = α(x)f(x; w) + (1 α(x))(1 f(x; w)) top(ys(x); k(x)). For more details we refer the reader to the code provided in the supplementary material.
Open Datasets Yes CIFAR-{10,100} and Celeb A Here we present our results on CIFAR-{10, 100} [30] and Celeb A [22]. Image Net Here we present the results on Image Net [49]. Large Movies Reviews Dataset Here we present results on the Large Movies Reviews Dataset [39].
Dataset Splits Yes For each trial we randomly split dataset C into a small (e.g., 500 examples validation dataset V and an unlabeled training dataset U.
Hardware Specification Yes We ran our experiments on 64 Cloud TPU v4s each with two cores.
Software Dependencies No We implemented all algorithms in Python and used the Tensor Flow deep learning library [1]. The paper mentions TensorFlow but does not specify a version number for it or for Python.
Experiment Setup Yes For the experiments on CIFAR-10/100 and Celeb A we use the Adam optimizer with initial learning rate lr = 0.001. We then proceed according to the following learning rate schedule... For SLa M we always use 0.5 as the lower bound for isotonic regression (i.e., the parameter lb in Algorithm 2).