reproducibilityindex.ai

When are ensembles really effective?

Authors: Ryan Theisen, Hyunsuk Kim, Yaoqing Yang, Liam Hodgkinson, Michael W. Mahoney

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We study, both theoretically and empirically, the fundamental question of when ensembling yields signiﬁcant performance improvements in classiﬁcation tasks. Theoretically, we prove new results... To complement this theory, we study ensembling empirically in a variety of settings, verifying the predictions made by our theory, and identifying practical scenarios where ensembling does and does not result in large performance improvements.
Researcher Affiliation	Academia	Ryan Theisen Department of Statistics University of California, Berkeley theisen@berkeley.edu Hyunsuk Kim Department of Statistics University of California, Berkeley hyskim7@berkeley.edu Yaoqing Yang Department of Computer Science Dartmouth College Yaoqing.Yang@dartmouth.edu Liam Hodgkinson School of Mathematics and Statistics University of Melbourne, Australia lhodgkinson@unimelb.edu.au Michael W. Mahoney International Computer Science Institute Lawrence Berkeley National Laboratory and Department of Statistics University of California, Berkeley mmahoney@stat.berkeley.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	Table 1: Datasets and ensembles used in empirical evaluations, where C denotes the number of classes, and M denotes the number of classiﬁers. Datasets: MNIST (5K subset) 10 [Den12], CIFAR-10 10 [KH+09], IMDB 2 [MDP+11], QSAR 2 [BGCT19], Thyroid 2 [QCHL87], GLUE (7 tasks) 2-3 [WSM+19]
Dataset Splits	No	The paper mentions "more extensive experimental details can be found in Appendix B.1." (which is not provided in the snippet), but the main text does not specify exact dataset split percentages or sample counts for training, validation, or test sets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments, only general statements like "large scale studies" or "models trained".
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	No	The paper mentions "varying capacity hyper-parameters (width for the Res Net18 models, number of random features for the random feature classiﬁers, and number of leaf nodes for the random forests)" and exploring "batch size/width space" and "learning rate decay", but it does not provide concrete hyperparameter values or detailed training configurations in the main text.