Diversity Matters When Learning From Ensembles

Authors: Giung Nam, Jongmin Yoon, Yoonho Lee, Juho Lee

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using standard image classification benchmarks, we empirically validate that our distillation method promotes diversities in student network predictions, leading to improved performance, especially in terms of uncertainty estimation.
Researcher Affiliation Collaboration KAIST1, Daejeon, South Korea, AITRICS2, Seoul, South Korea, Stanford University3, USA
Pseudocode Yes Algorithm 1 Knowledge distillation from deep ensembles with ODS perturbations
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes We compared our methods on CIFAR-10/100 and Tiny Image Net.
Dataset Splits Yes (a) Diversity plots of DE-4 teachers for Res Net-32 on train examples of CIFAR-10. (b) Validation set
Hardware Specification No The paper does not explicitly specify the hardware (e.g., CPU, GPU models, or cloud computing instances) used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The important hyperparameters for KD are the pair (α, τ); for CIFAR10, after a through hyperparameter sweep, we decided to stay consistent with the convention of (α, τ) = (0.9, 4) for all methods [Hinton et al., 2015, Cho and Hariharan, 2019, Wang et al., 2020]. For CIFAR-100 and Tiny Image Net, we used the value (α, τ) = (0.9, 1) for all methods. We fix ODS step-size η to 1/255 across all settings.