Ensemble Distribution Distillation
Authors: Andrey Malinin, Bruno Mlodozeniec, Mark Gales
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The properties of En D2 are investigated on both an artificial dataset, and on the CIFAR-10, CIFAR-100 and Tiny Image Net datasets, where it is shown that En D2 can approach the classification performance of an ensemble, and outperforms both standard DNNs and Ensemble Distillation on the tasks of misclassification and out-of-distribution input detection. |
| Researcher Affiliation | Collaboration | Andrey Malinin Yandex am969@yandex-team.ru Bruno Mlodozeniec Department of Engineering University of Cambridge bkm28@cam.ac.uk Mark Gales Department of Engineering University of Cambridge mjfg@eng.cam.ac.uk |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or a link to open-source code for the methodology described. |
| Open Datasets | Yes | Having confirmed the properties of En D2 on an artificial dataset, we now investigate Ensemble Distribution Distillation on the CIFAR-10 (C10), CIFAR-100 (C100) and Tiny Image Net (TIM) (Krizhevsky, 2009; CS231N, 2017) datasets. |
| Dataset Splits | Yes | Table 5: Description of datasets used in the experiments in terms of number of images and classes. Dataset Train Valid Test Classes CIFAR-10 50000 10000 10 CIFAR-100 50000 10000 100 Tiny Imagenet 100000 10000 200 |
| Hardware Specification | No | No specific hardware details (like GPU models, CPU types, or cloud instance specs) were provided for running experiments. |
| Software Dependencies | No | All models considered in this work were implemented in Pytorch (Paszke et al., 2017) using a variant of the VGG16 (Simonyan & Zisserman, 2015) architecture for image classification. ... All models were trained using the Adam (Kingma & Ba, 2015) optimizer |
| Experiment Setup | Yes | Table 6: Training Configurations. η0 is the initial learning rate, T0 is the initial temperature and Annealing refers to whether a temperature annealing schedule was used. The batch size for all models was 128. Dropout rate is quoted in terms of probability of not dropping out a unit. |