Dropout distillation
Authors: Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on standard benchmark datasets demonstrate the validity of our method, yielding consistent improvements over conventional dropout. |
| Researcher Affiliation | Collaboration | Samuel Rota Bul o ROTABULO@FBK.EU FBK-irst, Trento, Italy Lorenzo Porzi PORZI@FBK.EU FBK-irst, Trento, Italy Peter Kontschieder PKONTSCHIEDER@MAPILLARY.COM Mapillary, Graz, Austria Microsoft Research, Cambridge, UK |
| Pseudocode | Yes | Algorithm 1 Dropout training; Algorithm 2 Dropout distillation |
| Open Source Code | No | The paper does not provide any specific links to source code for the methodology or explicit statements about code availability. |
| Open Datasets | Yes | The CIFAR10 dataset (Krizhevsky & Hinton, 2009) ... The CIFAR100 dataset (Krizhevsky & Hinton, 2009) ... MNIST (Le Cun et al., 1998) handwritten digits recognition dataset. |
| Dataset Splits | No | The paper mentions training and testing splits for datasets but does not explicitly detail a separate validation split or its proportions. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | All baseline networks are trained using stochastic gradient descent with momentum 0.9 and L2 regularization. ... For Quick we perform 300 training epochs, with a learning rate of 0.05 for the first 200 and 0.005 for the last 100. For dropout distillation we use the same training schedule for all networks: first, we perform 20 epochs using a learning rate equal to the one used in the last iteration of the baseline network, then we reduce the learning rate by a factor of 10 and run the training for 10 additional epochs. For each network, during training we vertically flip each input image with probability 0.5. |