BatchEnsemble: an Alternative Approach to Efficient Ensemble and Lifelong Learning
Authors: Yeming Wen, Dustin Tran, Jimmy Ba
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across CIFAR-10, CIFAR-100, WMT14 EN-DE/EN-FR translation, and out-of-distribution tasks, Batch Ensemble yields competitive accuracy and uncertainties as typical ensembles; the speedup at test time is 3X and memory reduction is 3X at an ensemble of size 4. Empirically, we show that Batch Ensemble has the best trade-off among accuracy, running time, and memory on several deep learning architectures and learning tasks: CIFAR-10/100 classification with Res Net32 (He et al., 2016) and WMT14 EN-DE/EN-FR machine translation with Transformer (Vaswani et al., 2017). |
| Researcher Affiliation | Collaboration | Yeming Wen1,2,3 , Dustin Tran3 & Jimmy Ba1,2 1University of Toronto, 2Vector Institute, 3Google Brain |
| Pseudocode | No | The paper describes its methods using text and mathematical equations, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | A footnote on page 1 mentions '1https://github.com/google/edward2', but it does not explicitly state that the code for the methodology described in this paper is available there. It appears to be a general library. |
| Open Datasets | Yes | CIFAR: We consider two CIFAR datasets, CIFAR-10 and CIFAR-100 (Krizhevsky, 2009). WMT: In machine translation tasks, we consider the standard training datasets WMT16 English German and WMT14 English-French. ... (Vaswani et al., 2017). Split-CIFAR100 proposed in Rebuffiet al. (2016)... Split-Image Net: The dataset has the same set of images as Image Net dataset (Deng et al., 2009). |
| Dataset Splits | Yes | Newstest2013 and Newstest2014 are used as validation set and test set respectively. We consider T = 20 tasks on Split-CIFAR100, following the setup of Lopez-Paz & Ranzato (2017). Split-CIFAR100: It randomly splits the entire dataset into T tasks so each task consists of 100/T classes of images. |
| Hardware Specification | Yes | Experiments are run on 4 NVIDIA P100 GPUs. |
| Software Dependencies | No | The paper mentions deep learning architectures like ResNet32 and Transformer, and a library Edward2, but it does not specify version numbers for any software dependencies, such as deep learning frameworks or libraries. |
| Experiment Setup | Yes | The Transformer base is trained for 100K steps and the Transformer big is trained for 180K steps. We train the model with mini-batch size 128. The learning rate decreases from 0.1 to 0.01, from 0.01 to 0.001 at halfway of training and 75% of training. The weight decay coefficient is set to be 10 4. |