reproducibilityindex.ai

Ensemble Distribution Distillation

Authors: Andrey Malinin, Bruno Mlodozeniec, Mark Gales

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The properties of En D2 are investigated on both an artiﬁcial dataset, and on the CIFAR-10, CIFAR-100 and Tiny Image Net datasets, where it is shown that En D2 can approach the classiﬁcation performance of an ensemble, and outperforms both standard DNNs and Ensemble Distillation on the tasks of misclassiﬁcation and out-of-distribution input detection.
Researcher Affiliation	Collaboration	Andrey Malinin Yandex am969@yandex-team.ru Bruno Mlodozeniec Department of Engineering University of Cambridge bkm28@cam.ac.uk Mark Gales Department of Engineering University of Cambridge mjfg@eng.cam.ac.uk
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or a link to open-source code for the methodology described.
Open Datasets	Yes	Having conﬁrmed the properties of En D2 on an artiﬁcial dataset, we now investigate Ensemble Distribution Distillation on the CIFAR-10 (C10), CIFAR-100 (C100) and Tiny Image Net (TIM) (Krizhevsky, 2009; CS231N, 2017) datasets.
Dataset Splits	Yes	Table 5: Description of datasets used in the experiments in terms of number of images and classes. Dataset Train Valid Test Classes CIFAR-10 50000 10000 10 CIFAR-100 50000 10000 100 Tiny Imagenet 100000 10000 200
Hardware Specification	No	No specific hardware details (like GPU models, CPU types, or cloud instance specs) were provided for running experiments.
Software Dependencies	No	All models considered in this work were implemented in Pytorch (Paszke et al., 2017) using a variant of the VGG16 (Simonyan & Zisserman, 2015) architecture for image classiﬁcation. ... All models were trained using the Adam (Kingma & Ba, 2015) optimizer
Experiment Setup	Yes	Table 6: Training Conﬁgurations. η0 is the initial learning rate, T0 is the initial temperature and Annealing refers to whether a temperature annealing schedule was used. The batch size for all models was 128. Dropout rate is quoted in terms of probability of not dropping out a unit.