Shrub Ensembles for Online Classification

Authors: Sebastian Buschjäger, Sibylle Hess, Katharina J. Morik6123-6131

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In a series of 2 959 experiments on 12 different datasets, we compare our method against 8 state-of-the-art methods. Our Shrub Ensembles retain an excellent performance even when only little memory is available. We show that SE offers a better accuracy-memory trade-off in 7 of 12 cases, while having a statistically significant better performance than most other methods.
Researcher Affiliation Academia 1 Artificial Intelligence Group, TU Dortmund, Germany 2 Data Mining Group, Technische Universiteit Eindhoven, Eindhoven, the Netherlands
Pseudocode Yes Algorithm 1: Shrub Ensembles.
Open Source Code Yes Our implementation is available under https: //github.com/sbuschjaeger/se-online.
Open Datasets No The paper mentions using '12 different datasets depicted in the appendix' and some well-known dataset names in Table 1, but it does not provide concrete access information (e.g., specific links, DOIs, or citations with authors/year) for them in the main text.
Dataset Splits No The paper mentions 'average test-then-train accuracy' and discusses hyperparameter optimization but does not explicitly detail training, validation, or test splits, nor does it specify cross-validation settings.
Hardware Specification Yes For the experiments we used a cluster node with 256 AMD EPYC 7742 CPUs and 1TB ram in total.
Software Dependencies No The paper mentions software like 'Py Torch' and states 'Our SE method used our own C++ implementation' and 'MOA since it is implemented in Java', but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup No In a series of preliminary experiments, we identify reasonable ranges for each hyperparameter and method (e.g., number of trees in an ensemble, window size, step sizes etc.). Then, for each method and dataset we sample at most 50 random hyperparameter configurations from these ranges and evaluate their performance. An example of such a configuration can be found in the appendix and further details can be taken from the source code.