Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Shrub Ensembles for Online Classification

Authors: Sebastian Buschjäger, Sibylle Hess, Katharina J. Morik6123-6131

AAAI 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In a series of 2 959 experiments on 12 different datasets, we compare our method against 8 state-of-the-art methods. Our Shrub Ensembles retain an excellent performance even when only little memory is available. We show that SE offers a better accuracy-memory trade-off in 7 of 12 cases, while having a statistically signiﬁcant better performance than most other methods.
Researcher Affiliation	Academia	1 Artiﬁcial Intelligence Group, TU Dortmund, Germany 2 Data Mining Group, Technische Universiteit Eindhoven, Eindhoven, the Netherlands
Pseudocode	Yes	Algorithm 1: Shrub Ensembles.
Open Source Code	Yes	Our implementation is available under https: //github.com/sbuschjaeger/se-online.
Open Datasets	No	The paper mentions using '12 different datasets depicted in the appendix' and some well-known dataset names in Table 1, but it does not provide concrete access information (e.g., specific links, DOIs, or citations with authors/year) for them in the main text.
Dataset Splits	No	The paper mentions 'average test-then-train accuracy' and discusses hyperparameter optimization but does not explicitly detail training, validation, or test splits, nor does it specify cross-validation settings.
Hardware Specification	Yes	For the experiments we used a cluster node with 256 AMD EPYC 7742 CPUs and 1TB ram in total.
Software Dependencies	No	The paper mentions software like 'Py Torch' and states 'Our SE method used our own C++ implementation' and 'MOA since it is implemented in Java', but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	No	In a series of preliminary experiments, we identify reasonable ranges for each hyperparameter and method (e.g., number of trees in an ensemble, window size, step sizes etc.). Then, for each method and dataset we sample at most 50 random hyperparameter conﬁgurations from these ranges and evaluate their performance. An example of such a conﬁguration can be found in the appendix and further details can be taken from the source code.