Dropout distillation

Authors: Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on standard benchmark datasets demonstrate the validity of our method, yielding consistent improvements over conventional dropout.
Researcher Affiliation Collaboration Samuel Rota Bul o ROTABULO@FBK.EU FBK-irst, Trento, Italy Lorenzo Porzi PORZI@FBK.EU FBK-irst, Trento, Italy Peter Kontschieder PKONTSCHIEDER@MAPILLARY.COM Mapillary, Graz, Austria Microsoft Research, Cambridge, UK
Pseudocode Yes Algorithm 1 Dropout training; Algorithm 2 Dropout distillation
Open Source Code No The paper does not provide any specific links to source code for the methodology or explicit statements about code availability.
Open Datasets Yes The CIFAR10 dataset (Krizhevsky & Hinton, 2009) ... The CIFAR100 dataset (Krizhevsky & Hinton, 2009) ... MNIST (Le Cun et al., 1998) handwritten digits recognition dataset.
Dataset Splits No The paper mentions training and testing splits for datasets but does not explicitly detail a separate validation split or its proportions.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, cloud instance types) used for running the experiments.
Software Dependencies No The paper does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes All baseline networks are trained using stochastic gradient descent with momentum 0.9 and L2 regularization. ... For Quick we perform 300 training epochs, with a learning rate of 0.05 for the first 200 and 0.005 for the last 100. For dropout distillation we use the same training schedule for all networks: first, we perform 20 epochs using a learning rate equal to the one used in the last iteration of the baseline network, then we reduce the learning rate by a factor of 10 and run the training for 10 additional epochs. For each network, during training we vertically flip each input image with probability 0.5.