Deep Ensembles Work, But Are They Necessary?
Authors: Taiga Abe, Estefany Kelly Buchanan, Geoff Pleiss, Richard Zemel, John P. Cunningham
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We reuse and train a variety of neural networks on two benchmark image classification datasets: CIFAR10 [41] and Image Net [14]. In particular, we include the 137 CIFAR10 models trained by Miller et al. [52], corresponding to 32 different architectures each trained for 2-5 seeds; as well as the standard 78 Image Net models curated by Taori et al. [72], each corresponding to a different architecture trained for 1 seed. To form homogeneous ensembles, we additionally train 10 network architectures on CIFAR10 and three on Image Net. We train 5 independent instances of each model architecture, where each instance differs only in terms of initialization and minibatch ordering. We form homogeneous deep ensembles by combining 4 out of the 5 random seeds. From this process, we can consider 5 single model replicas and 5 ensemble replicas for each model architecture. Unless otherwise stated, ensembles are formed following Eq. (1). |
| Researcher Affiliation | Academia | Taiga Abe 1 E. Kelly Buchanan 1 Geoff Pleiss1 Richard Zemel1 John P. Cunningham1 1Columbia University {ta2507,ekb2154,gmp2162,jpc2181}@columbia.edu zemel@cs.columbia.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] We provide a link to a repository in the supplemental material section Appx. B. This repository contains instructions to reproduce main figures and to download relevant data. |
| Open Datasets | Yes | We reuse and train a variety of neural networks on two benchmark image classification datasets: CIFAR10 [41] and Image Net [14]. |
| Dataset Splits | No | The paper mentions training and testing on datasets like CIFAR10 and ImageNet but does not explicitly specify the training/validation/test splits (e.g., percentages or sample counts) or clearly define a validation set split in the main text. |
| Hardware Specification | Yes | Appx. B COMPUTING INFRASTRUCTURE. All experiments were run on NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper states, 'Our code base is written in Python 3, and utilizes the PyTorch framework [65] to train our neural networks. We utilize standard packages such as NumPy, Pandas, and SciPy.' While Python 3 is versioned, specific version numbers are not provided for PyTorch or the other libraries. |
| Experiment Setup | No | The paper describes how ensembles were formed (e.g., 'We train 5 independent instances of each model architecture... We form homogeneous deep ensembles by combining 4 out of the 5 random seeds.') but does not include specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) in the main text. It defers such details to an appendix, which refers to another code repository for default hyperparameters. |