Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation

Authors: Alexandre Rame, Matthieu Cord

ICLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We obtain state-of-the-art accuracy results on CIFAR-10/100: for example, an ensemble of 5 networks trained with DICE matches an ensemble of 7 networks trained independently. We further analyze the consequences on calibration, uncertainty estimation, out-of-distribution detection and online co-distillation. In this section, we present our experimental results on the CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009) datasets.
Researcher Affiliation Collaboration Alexandre Rame Sorbonne Universit e Paris, France EMAIL Matthieu Cord Sorbonne Universit e & valeo.ai Paris, France EMAIL
Pseudocode Yes Algorithm 1: Full DICE Procedure for M = 2 members
Open Source Code No The paper states: 'We borrowed the evaluation code from https://github.com/ uoguelph-mlrg/confidence_estimation (De Vries & Taylor, 2018).' It does not provide an explicit statement or link for the open-source code of their own proposed method (DICE).
Open Datasets Yes In this section, we present our experimental results on the CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009) datasets.
Dataset Splits Yes Hyperparameters for adversarial training and information bottleneck were ๏ฌne-tuned on a validation dataset made of 5% of the training dataset, see Appendix D.1. For hyperparameter selection and ablation studies, we train on 95% of the training dataset, and analyze performances on the validation dataset made of the remaining 5%.
Hardware Specification No This work was granted access to the HPC resources of IDRIS under the allocation 20XXAD011011953 made by GENCI.
Software Dependencies No The paper mentions general software components like 'Res Net' and 'Wide Res Net' architectures and optimization algorithms like 'SGD', but it does not specify version numbers for programming languages or libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes Following (Chen et al., 2020b), we used SGD with Nesterov with momentum of 0.9, mini-batch size of 128, weight decay of 5e-4, 300 epochs, a standard learning rate scheduler that sets values {0.1, 0.001, 0.0001} at steps {0, 150, 225} for CIFAR-10/100. log(ฮฒceb) reaches values {100, 10, 2, 1.5, 1} at steps {0, 8, 175, 250, 300}.