When are ensembles really effective?
Authors: Ryan Theisen, Hyunsuk Kim, Yaoqing Yang, Liam Hodgkinson, Michael W. Mahoney
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study, both theoretically and empirically, the fundamental question of when ensembling yields significant performance improvements in classification tasks. Theoretically, we prove new results... To complement this theory, we study ensembling empirically in a variety of settings, verifying the predictions made by our theory, and identifying practical scenarios where ensembling does and does not result in large performance improvements. |
| Researcher Affiliation | Academia | Ryan Theisen Department of Statistics University of California, Berkeley theisen@berkeley.edu Hyunsuk Kim Department of Statistics University of California, Berkeley hyskim7@berkeley.edu Yaoqing Yang Department of Computer Science Dartmouth College Yaoqing.Yang@dartmouth.edu Liam Hodgkinson School of Mathematics and Statistics University of Melbourne, Australia lhodgkinson@unimelb.edu.au Michael W. Mahoney International Computer Science Institute Lawrence Berkeley National Laboratory and Department of Statistics University of California, Berkeley mmahoney@stat.berkeley.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | Table 1: Datasets and ensembles used in empirical evaluations, where C denotes the number of classes, and M denotes the number of classifiers. Datasets: MNIST (5K subset) 10 [Den12], CIFAR-10 10 [KH+09], IMDB 2 [MDP+11], QSAR 2 [BGCT19], Thyroid 2 [QCHL87], GLUE (7 tasks) 2-3 [WSM+19] |
| Dataset Splits | No | The paper mentions "more extensive experimental details can be found in Appendix B.1." (which is not provided in the snippet), but the main text does not specify exact dataset split percentages or sample counts for training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments, only general statements like "large scale studies" or "models trained". |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | No | The paper mentions "varying capacity hyper-parameters (width for the Res Net18 models, number of random features for the random feature classifiers, and number of leaf nodes for the random forests)" and exploring "batch size/width space" and "learning rate decay", but it does not provide concrete hyperparameter values or detailed training configurations in the main text. |