Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles

Authors: Youssouf Emine, Alexandre Forel, Idriss Malek, Thibaut Vidal

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In multiple computational experiments, we show that our approach is a free lunch , significantly reducing the ensemble size without altering the model s behavior. Thus, we can preserve state-of-the-art performance at a fraction of the original model s size.
Researcher Affiliation Academia 1Department of Mathematics and Industrial Engineering, Polytechnique Montréal 2Canada Excellence Research Chair in Data-Science for Real-time Decision-Making (CERC) 3CIRRELT & SCALE-AI Chair in Data-Driven Supply Chains EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Faithful pruning algorithm
Open Source Code Yes Package www.github.com/eminyous/fipe Code www.github.com/eminyous/fipe-experiments
Open Datasets Yes We perform our analyses across 11 datasets commonly used in previous studies. The characteristics of the datasets are presented in Table 2: their number of samples n, number of features p including the number of numerical p N and binary p B features, and number of classes C.
Dataset Splits Yes In each experiment, we split the data set into a training dataset (80%) and a test dataset (20%) to measure faithfulness and test accuracy. This process is repeated over five different random seeds.
Hardware Specification Yes The experiments are conducted on a computing grid. Each experiment utilizes a single core of an Intel Xeon Gold 6258R CPU running at 2.7 GHz and is provided with 4 GB RAM.
Software Dependencies Yes All experiments are implemented in python. We used scikit-learn for training the ensembles. All optimization problems are solved to global optimality using the commercial solver Gurobi v11.0.
Experiment Setup Yes First, we evaluate how much FIPE is able to prune tree ensembles while certifying faithfulness to the original model. We compare both the certifiable minimal-size and fast but approximate, respectively denoted by FIPE 0 and FIPE 1. We measure the number of learners in the pruned ensemble, the faithfulness to the original ensemble on the test set denoted FI, the test accuracy ACC, and the number of oracle calls rounded to its upper integer denoted by K. All performance metrics are average over the five repetitions. The results are presented in Table 1 for ensembles of 50 learners.