Ensemble Distribution Distillation

Authors: Andrey Malinin, Bruno Mlodozeniec, Mark Gales

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The properties of En D2 are investigated on both an artificial dataset, and on the CIFAR-10, CIFAR-100 and Tiny Image Net datasets, where it is shown that En D2 can approach the classification performance of an ensemble, and outperforms both standard DNNs and Ensemble Distillation on the tasks of misclassification and out-of-distribution input detection.
Researcher Affiliation Collaboration Andrey Malinin Yandex am969@yandex-team.ru Bruno Mlodozeniec Department of Engineering University of Cambridge bkm28@cam.ac.uk Mark Gales Department of Engineering University of Cambridge mjfg@eng.cam.ac.uk
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or a link to open-source code for the methodology described.
Open Datasets Yes Having confirmed the properties of En D2 on an artificial dataset, we now investigate Ensemble Distribution Distillation on the CIFAR-10 (C10), CIFAR-100 (C100) and Tiny Image Net (TIM) (Krizhevsky, 2009; CS231N, 2017) datasets.
Dataset Splits Yes Table 5: Description of datasets used in the experiments in terms of number of images and classes. Dataset Train Valid Test Classes CIFAR-10 50000 10000 10 CIFAR-100 50000 10000 100 Tiny Imagenet 100000 10000 200
Hardware Specification No No specific hardware details (like GPU models, CPU types, or cloud instance specs) were provided for running experiments.
Software Dependencies No All models considered in this work were implemented in Pytorch (Paszke et al., 2017) using a variant of the VGG16 (Simonyan & Zisserman, 2015) architecture for image classification. ... All models were trained using the Adam (Kingma & Ba, 2015) optimizer
Experiment Setup Yes Table 6: Training Configurations. η0 is the initial learning rate, T0 is the initial temperature and Annealing refers to whether a temperature annealing schedule was used. The batch size for all models was 128. Dropout rate is quoted in terms of probability of not dropping out a unit.