reproducibilityindex.ai

Unifying distillation and privileged information

Authors: David Lopez-Paz, Leon Bottou, Bernhard Schölkopf, Vladimir Vapnik

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical and causal insight about the inner workings of generalized distillation, extend it to unsupervised, semisupervised and multitask learning scenarios, and illustrate its efﬁcacy on a variety of numerical simulations on both synthetic and real-world data.5 NUMERICAL SIMULATIONS We now present some experiments to illustrate when the distillation of privileged information is effective, and when it is not. The necessary Python code to replicate all the following experiments is available at http://github.com/lopezpaz.
Researcher Affiliation	Collaboration	David Lopez-Paz Facebook AI Research, Paris, France dlp@fb.com L eon Bottou Facebook AI Research, New York, USA leon@bottou.org Bernhard Sch olkopf Max Planck Insitute for Intelligent Systems, T ubingen, Germany bs@tuebingen.mpg.de Vladimir Vapnik Facebook AI Research and Columbia University, New York, USA vladimir.vapnik@gmail.com
Pseudocode	No	Then, the process of generalized distillation is as follows: 1. Learn teacher ft Ft using the input-output pairs {(x i , yi)}n i=1 and Eq. 3. 2. Compute teacher soft labels {σ(ft(x i )/T)}n i=1, using temperature parameter T > 0. 3. Learn student fs Fs using the input-output pairs {(xi, yi)}n i=1, {(xi, si)}n i=1, Eq. 4, and imitation parameter λ [0, 1].2
Open Source Code	Yes	The necessary Python code to replicate all the following experiments is available at http://github.com/lopezpaz.
Open Datasets	Yes	5. MNIST handwritten digit image classiﬁcation The privileged features are the original 28x28 pixels MNIST handwritten digit images (Le Cun et al., 1998b) 6. Semisupervised learning We explore the semisupervised capabilities of generalized distillation on the CIFAR10 dataset (Krizhevsky, 2009). 7. Multitask learning The SARCOS dataset (Vijayakumar, 2000)
Dataset Splits	No	We use 300 or 500 samples to train both the teacher and the student, and test their accuracies at multiple levels of temperature and imitation on the full test set.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or other computing specifications used for running the experiments.
Software Dependencies	No	The necessary Python code to replicate all the following experiments is available at http://github.com/lopezpaz.
Experiment Setup	Yes	Both student and teacher are neural networks of composed by two hidden layers of 20 rectiﬁer linear units and a softmax output layer (the same networks are used in the remaining experiments). The temperature parameter T > 0 controls how much do we want to soften or smooth the class-probability predictions from ft, and the imitation parameter λ [0, 1] balances the importance between imitating the soft predictions si and predicting the true hard labels yi. distilling the teacher explanations into the student classiﬁer with λ = T = 1.