Deep Ensembles Work, But Are They Necessary?

Authors: Taiga Abe, Estefany Kelly Buchanan, Geoff Pleiss, Richard Zemel, John P. Cunningham

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We reuse and train a variety of neural networks on two benchmark image classification datasets: CIFAR10 [41] and Image Net [14]. In particular, we include the 137 CIFAR10 models trained by Miller et al. [52], corresponding to 32 different architectures each trained for 2-5 seeds; as well as the standard 78 Image Net models curated by Taori et al. [72], each corresponding to a different architecture trained for 1 seed. To form homogeneous ensembles, we additionally train 10 network architectures on CIFAR10 and three on Image Net. We train 5 independent instances of each model architecture, where each instance differs only in terms of initialization and minibatch ordering. We form homogeneous deep ensembles by combining 4 out of the 5 random seeds. From this process, we can consider 5 single model replicas and 5 ensemble replicas for each model architecture. Unless otherwise stated, ensembles are formed following Eq. (1).
Researcher Affiliation Academia Taiga Abe 1 E. Kelly Buchanan 1 Geoff Pleiss1 Richard Zemel1 John P. Cunningham1 1Columbia University {ta2507,ekb2154,gmp2162,jpc2181}@columbia.edu zemel@cs.columbia.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] We provide a link to a repository in the supplemental material section Appx. B. This repository contains instructions to reproduce main figures and to download relevant data.
Open Datasets Yes We reuse and train a variety of neural networks on two benchmark image classification datasets: CIFAR10 [41] and Image Net [14].
Dataset Splits No The paper mentions training and testing on datasets like CIFAR10 and ImageNet but does not explicitly specify the training/validation/test splits (e.g., percentages or sample counts) or clearly define a validation set split in the main text.
Hardware Specification Yes Appx. B COMPUTING INFRASTRUCTURE. All experiments were run on NVIDIA V100 GPUs.
Software Dependencies No The paper states, 'Our code base is written in Python 3, and utilizes the PyTorch framework [65] to train our neural networks. We utilize standard packages such as NumPy, Pandas, and SciPy.' While Python 3 is versioned, specific version numbers are not provided for PyTorch or the other libraries.
Experiment Setup No The paper describes how ensembles were formed (e.g., 'We train 5 independent instances of each model architecture... We form homogeneous deep ensembles by combining 4 out of the 5 random seeds.') but does not include specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) in the main text. It defers such details to an appendix, which refers to another code repository for default hyperparameters.