Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Dropout distillation
Authors: Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder
ICML 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on standard benchmark datasets demonstrate the validity of our method, yielding consistent improvements over conventional dropout. |
| Researcher Affiliation | Collaboration | Samuel Rota Bul o EMAIL FBK-irst, Trento, Italy Lorenzo Porzi EMAIL FBK-irst, Trento, Italy Peter Kontschieder EMAIL Mapillary, Graz, Austria Microsoft Research, Cambridge, UK |
| Pseudocode | Yes | Algorithm 1 Dropout training; Algorithm 2 Dropout distillation |
| Open Source Code | No | The paper does not provide any specific links to source code for the methodology or explicit statements about code availability. |
| Open Datasets | Yes | The CIFAR10 dataset (Krizhevsky & Hinton, 2009) ... The CIFAR100 dataset (Krizhevsky & Hinton, 2009) ... MNIST (Le Cun et al., 1998) handwritten digits recognition dataset. |
| Dataset Splits | No | The paper mentions training and testing splits for datasets but does not explicitly detail a separate validation split or its proportions. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | All baseline networks are trained using stochastic gradient descent with momentum 0.9 and L2 regularization. ... For Quick we perform 300 training epochs, with a learning rate of 0.05 for the first 200 and 0.005 for the last 100. For dropout distillation we use the same training schedule for all networks: first, we perform 20 epochs using a learning rate equal to the one used in the last iteration of the baseline network, then we reduce the learning rate by a factor of 10 and run the training for 10 additional epochs. For each network, during training we vertically flip each input image with probability 0.5. |