Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
BatchEnsemble: an Alternative Approach to Efficient Ensemble and Lifelong Learning
Authors: Yeming Wen, Dustin Tran, Jimmy Ba
ICLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across CIFAR-10, CIFAR-100, WMT14 EN-DE/EN-FR translation, and out-of-distribution tasks, Batch Ensemble yields competitive accuracy and uncertainties as typical ensembles; the speedup at test time is 3X and memory reduction is 3X at an ensemble of size 4. Empirically, we show that Batch Ensemble has the best trade-off among accuracy, running time, and memory on several deep learning architectures and learning tasks: CIFAR-10/100 classification with Res Net32 (He et al., 2016) and WMT14 EN-DE/EN-FR machine translation with Transformer (Vaswani et al., 2017). |
| Researcher Affiliation | Collaboration | Yeming Wen1,2,3 , Dustin Tran3 & Jimmy Ba1,2 1University of Toronto, 2Vector Institute, 3Google Brain |
| Pseudocode | No | The paper describes its methods using text and mathematical equations, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | A footnote on page 1 mentions '1https://github.com/google/edward2', but it does not explicitly state that the code for the methodology described in this paper is available there. It appears to be a general library. |
| Open Datasets | Yes | CIFAR: We consider two CIFAR datasets, CIFAR-10 and CIFAR-100 (Krizhevsky, 2009). WMT: In machine translation tasks, we consider the standard training datasets WMT16 English German and WMT14 English-French. ... (Vaswani et al., 2017). Split-CIFAR100 proposed in Rebuffiet al. (2016)... Split-Image Net: The dataset has the same set of images as Image Net dataset (Deng et al., 2009). |
| Dataset Splits | Yes | Newstest2013 and Newstest2014 are used as validation set and test set respectively. We consider T = 20 tasks on Split-CIFAR100, following the setup of Lopez-Paz & Ranzato (2017). Split-CIFAR100: It randomly splits the entire dataset into T tasks so each task consists of 100/T classes of images. |
| Hardware Specification | Yes | Experiments are run on 4 NVIDIA P100 GPUs. |
| Software Dependencies | No | The paper mentions deep learning architectures like ResNet32 and Transformer, and a library Edward2, but it does not specify version numbers for any software dependencies, such as deep learning frameworks or libraries. |
| Experiment Setup | Yes | The Transformer base is trained for 100K steps and the Transformer big is trained for 180K steps. We train the model with mini-batch size 128. The learning rate decreases from 0.1 to 0.01, from 0.01 to 0.001 at halfway of training and 75% of training. The weight decay coefficient is set to be 10 4. |