Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Teacher’s pet: understanding and mitigating biases in distillation

Authors: Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on several image classiﬁcation benchmarks show that these modiﬁcations of distillation maintain boost in overall accuracy, while additionally ensuring improvement in subgroup performance. ... We report results on the datasets used in 3: CIFAR-100, Image Net; and long-tailed (LT) versions of the same. ... Table 3 summarises the results for all methods.
Researcher Affiliation	Industry	Michal Lukasik EMAIL Google Research Srinadh Bhojanapalli EMAIL Google Research Aditya Krishna Menon EMAIL Google Research Sanjiv Kumar EMAIL Google Research
Pseudocode	No	The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code	No	The paper does not contain an unambiguous statement of code release or a direct link to a source code repository.
Open Datasets	Yes	Experiments on several image classiﬁcation benchmarks show that these modiﬁcations of distillation maintain boost in overall accuracy, while additionally ensuring improvement in subgroup performance. ... We train a Res Net-56 teacher on CIFAR-100-LT, a long-tailed version of CIFAR-100 (Cui et al., 2019; Cao et al., 2019) ... For Image Net, we use the long-tailed version from Liu et al. (2019). ... We conﬁrm this can indeed hold on the UCI Adult dataset using random forest models (details in Appendix C.3).
Dataset Splits	Yes	For the Ada-* methods, per 4, creating the label-dependent αy requires estimating the teacher s generalisation performance. To do this, we create a random holdout split of the training set. For non-LT datasets, we randomly split into 80% (new train) 20% (dev). For LT datasets, for each class we hold out k examples into the dev set (k = 50 for Imagenet-LT, k = 20 for CIFAR-100-LT), or half of examples for a class if the total number of per class examples is 2k.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory.
Software Dependencies	No	The paper mentions using SGD and ResNet architectures, but it does not specify any software libraries, packages, or their version numbers that would be required to reproduce the experiments.
Experiment Setup	Yes	For all datasets, we train using SGD and weight decay 10-4 for CIFAR, and 0.5 10-4 for Imagenet datasets. ... CIFAR-100. We train for 450 epochs with an initial learning rate of 1.0, with a linear warmup in the ﬁrst 15 epochs, and an annealed learning rate schedule. We drop the learning rate by a factor of 10 at epochs number: 200, 300 and 400. We use a mini-batch size of 1024. We use SGD with Nesterov momentum of 0.9. For our distillation experiments we train only with the cross-entropy objective against the teacher s logits. For each method we ﬁnd the best temperature from the list of values: {1, 2, 3, 4, 5}. Image Net. We train for 90 epochs with an initial learning rate of 0.8, with a linear warmup in the ﬁrst 5 epochs, and an annealed learning rate schedule. We drop the learning rate by a factor of 10 at epochs number: 30, 60 and 80. We use a mini-batch size of 1024. For our distillation experiments we train with the distillation objective as deﬁned in Equation 1 setting α = 0.2. For each method we ﬁx the temperature to 0.9. Long-tail (LT) datasets. We follow setup as in the non-long tail version, except for the learning rate schedule, which we change to follow the cosine schedule (Loshchilov & Hutter, 2017).