Diversity Matters When Learning From Ensembles
Authors: Giung Nam, Jongmin Yoon, Yoonho Lee, Juho Lee
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using standard image classification benchmarks, we empirically validate that our distillation method promotes diversities in student network predictions, leading to improved performance, especially in terms of uncertainty estimation. |
| Researcher Affiliation | Collaboration | KAIST1, Daejeon, South Korea, AITRICS2, Seoul, South Korea, Stanford University3, USA |
| Pseudocode | Yes | Algorithm 1 Knowledge distillation from deep ensembles with ODS perturbations |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We compared our methods on CIFAR-10/100 and Tiny Image Net. |
| Dataset Splits | Yes | (a) Diversity plots of DE-4 teachers for Res Net-32 on train examples of CIFAR-10. (b) Validation set |
| Hardware Specification | No | The paper does not explicitly specify the hardware (e.g., CPU, GPU models, or cloud computing instances) used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The important hyperparameters for KD are the pair (α, τ); for CIFAR10, after a through hyperparameter sweep, we decided to stay consistent with the convention of (α, τ) = (0.9, 4) for all methods [Hinton et al., 2015, Cho and Hariharan, 2019, Wang et al., 2020]. For CIFAR-100 and Tiny Image Net, we used the value (α, τ) = (0.9, 1) for all methods. We fix ODS step-size η to 1/255 across all settings. |