Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
SLaM: Student-Label Mixing for Distillation with Unlabeled Examples
Authors: Vasilis Kontonis, Fotis Iliopoulos, Khoa Trinh, Cenk Baykal, Gaurav Menghani, Erik Vee
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we present a principled method for knowledge distillation with unlabeled examples that we call Student-Label Mixing (SLa M) and we show that it consistently improves over prior approaches by evaluating it on several standard benchmarks. |
| Researcher Affiliation | Collaboration | Vasilis Kontonis UT Austin EMAIL Fotis Iliopoulos Google Research EMAIL Khoa Trinh Google Research EMAIL Cenk Baykal Google Research EMAIL Gaurav Menghani Google Research EMAIL Erik Vee Google Research EMAIL |
| Pseudocode | Yes | In this section we present pseudo-code describing the distillation with unlabeled examples setting and the SLa M method, Algorithm 1. |
| Open Source Code | Yes | Remark B.1. We remark that in our experiments, we observed that not normalizing the mixing operation with k(x) 1 resulted in better results overall. Therefore, the mixing operation used in our experimental evaluation of SLa M is mix(f(x; w); α(x), k(x)) = α(x)f(x; w) + (1 α(x))(1 f(x; w)) top(ys(x); k(x)). For more details we refer the reader to the code provided in the supplementary material. |
| Open Datasets | Yes | CIFAR-{10,100} and Celeb A Here we present our results on CIFAR-{10, 100} [30] and Celeb A [22]. Image Net Here we present the results on Image Net [49]. Large Movies Reviews Dataset Here we present results on the Large Movies Reviews Dataset [39]. |
| Dataset Splits | Yes | For each trial we randomly split dataset C into a small (e.g., 500 examples validation dataset V and an unlabeled training dataset U. |
| Hardware Specification | Yes | We ran our experiments on 64 Cloud TPU v4s each with two cores. |
| Software Dependencies | No | We implemented all algorithms in Python and used the Tensor Flow deep learning library [1]. The paper mentions TensorFlow but does not specify a version number for it or for Python. |
| Experiment Setup | Yes | For the experiments on CIFAR-10/100 and Celeb A we use the Adam optimizer with initial learning rate lr = 0.001. We then proceed according to the following learning rate schedule... For SLa M we always use 0.5 as the lower bound for isotonic regression (i.e., the parameter lb in Algorithm 2). |