When are ensembles really effective?

Authors: Ryan Theisen, Hyunsuk Kim, Yaoqing Yang, Liam Hodgkinson, Michael W. Mahoney

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study, both theoretically and empirically, the fundamental question of when ensembling yields significant performance improvements in classification tasks. Theoretically, we prove new results... To complement this theory, we study ensembling empirically in a variety of settings, verifying the predictions made by our theory, and identifying practical scenarios where ensembling does and does not result in large performance improvements.
Researcher Affiliation Academia Ryan Theisen Department of Statistics University of California, Berkeley theisen@berkeley.edu Hyunsuk Kim Department of Statistics University of California, Berkeley hyskim7@berkeley.edu Yaoqing Yang Department of Computer Science Dartmouth College Yaoqing.Yang@dartmouth.edu Liam Hodgkinson School of Mathematics and Statistics University of Melbourne, Australia lhodgkinson@unimelb.edu.au Michael W. Mahoney International Computer Science Institute Lawrence Berkeley National Laboratory and Department of Statistics University of California, Berkeley mmahoney@stat.berkeley.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes Table 1: Datasets and ensembles used in empirical evaluations, where C denotes the number of classes, and M denotes the number of classifiers. Datasets: MNIST (5K subset) 10 [Den12], CIFAR-10 10 [KH+09], IMDB 2 [MDP+11], QSAR 2 [BGCT19], Thyroid 2 [QCHL87], GLUE (7 tasks) 2-3 [WSM+19]
Dataset Splits No The paper mentions "more extensive experimental details can be found in Appendix B.1." (which is not provided in the snippet), but the main text does not specify exact dataset split percentages or sample counts for training, validation, or test sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments, only general statements like "large scale studies" or "models trained".
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup No The paper mentions "varying capacity hyper-parameters (width for the Res Net18 models, number of random features for the random feature classifiers, and number of leaf nodes for the random forests)" and exploring "batch size/width space" and "learning rate decay", but it does not provide concrete hyperparameter values or detailed training configurations in the main text.