Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles

Authors: Youssouf Emine, Alexandre Forel, Idriss Malek, Thibaut Vidal

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In multiple computational experiments, we show that our approach is a free lunch , significantly reducing the ensemble size without altering the model s behavior. Thus, we can preserve state-of-the-art performance at a fraction of the original model s size.
Researcher Affiliation	Academia	1Department of Mathematics and Industrial Engineering, Polytechnique Montréal 2Canada Excellence Research Chair in Data-Science for Real-time Decision-Making (CERC) 3CIRRELT & SCALE-AI Chair in Data-Driven Supply Chains EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Faithful pruning algorithm
Open Source Code	Yes	Package www.github.com/eminyous/ﬁpe Code www.github.com/eminyous/ﬁpe-experiments
Open Datasets	Yes	We perform our analyses across 11 datasets commonly used in previous studies. The characteristics of the datasets are presented in Table 2: their number of samples n, number of features p including the number of numerical p N and binary p B features, and number of classes C.
Dataset Splits	Yes	In each experiment, we split the data set into a training dataset (80%) and a test dataset (20%) to measure faithfulness and test accuracy. This process is repeated over ﬁve different random seeds.
Hardware Specification	Yes	The experiments are conducted on a computing grid. Each experiment utilizes a single core of an Intel Xeon Gold 6258R CPU running at 2.7 GHz and is provided with 4 GB RAM.
Software Dependencies	Yes	All experiments are implemented in python. We used scikit-learn for training the ensembles. All optimization problems are solved to global optimality using the commercial solver Gurobi v11.0.
Experiment Setup	Yes	First, we evaluate how much FIPE is able to prune tree ensembles while certifying faithfulness to the original model. We compare both the certiﬁable minimal-size and fast but approximate, respectively denoted by FIPE 0 and FIPE 1. We measure the number of learners in the pruned ensemble, the faithfulness to the original ensemble on the test set denoted FI, the test accuracy ACC, and the number of oracle calls rounded to its upper integer denoted by K. All performance metrics are average over the ﬁve repetitions. The results are presented in Table 1 for ensembles of 50 learners.