reproducibilityindex.ai

Dropout distillation

Authors: Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on standard benchmark datasets demonstrate the validity of our method, yielding consistent improvements over conventional dropout.
Researcher Affiliation	Collaboration	Samuel Rota Bul o ROTABULO@FBK.EU FBK-irst, Trento, Italy Lorenzo Porzi PORZI@FBK.EU FBK-irst, Trento, Italy Peter Kontschieder PKONTSCHIEDER@MAPILLARY.COM Mapillary, Graz, Austria Microsoft Research, Cambridge, UK
Pseudocode	Yes	Algorithm 1 Dropout training; Algorithm 2 Dropout distillation
Open Source Code	No	The paper does not provide any specific links to source code for the methodology or explicit statements about code availability.
Open Datasets	Yes	The CIFAR10 dataset (Krizhevsky & Hinton, 2009) ... The CIFAR100 dataset (Krizhevsky & Hinton, 2009) ... MNIST (Le Cun et al., 1998) handwritten digits recognition dataset.
Dataset Splits	No	The paper mentions training and testing splits for datasets but does not explicitly detail a separate validation split or its proportions.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, cloud instance types) used for running the experiments.
Software Dependencies	No	The paper does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	All baseline networks are trained using stochastic gradient descent with momentum 0.9 and L2 regularization. ... For Quick we perform 300 training epochs, with a learning rate of 0.05 for the ﬁrst 200 and 0.005 for the last 100. For dropout distillation we use the same training schedule for all networks: ﬁrst, we perform 20 epochs using a learning rate equal to the one used in the last iteration of the baseline network, then we reduce the learning rate by a factor of 10 and run the training for 10 additional epochs. For each network, during training we vertically ﬂip each input image with probability 0.5.