Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Tree Ensemble Explainability through the Hoeffding Functional Decomposition and TreeHFD Algorithm

Authors: Clément Bénard

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The high performance of Tree HFD is demonstrated through experiments on both simulated and real data
Researcher Affiliation	Industry	Clément Bénard Thales cort AIx-Labs SINCLAIR AI Lab 1 avenue Augustin Fresnel, 91120 Palaiseau, France EMAIL
Pseudocode	No	The paper describes the Tree HFD algorithm in Section 3, detailing its mathematical formulation and components, but does not present it in a structured pseudocode or algorithm block.
Open Source Code	Yes	The high performance of Tree HFD is demonstrated through experiments on both simulated and real data, using our treehfd Python package (https://github.com/Thales Group/treehfd).
Open Datasets	Yes	We assess the performance of Tree HFD using nine real public datasets from the UCI repository (Kelly et al., 2024)
Dataset Splits	Yes	Next, we ﬁt a boosted tree ensemble on a training dataset Dn of size n = 5000... Finally, we estimate the Mean Square Error (MSE) of each Tree HFD component, using the analytical formulas and an independent testing dataset. ... We randomly split the dataset in two halves, and compute both EBM and Tree HFD decompositions with each subsample.
Hardware Specification	Yes	Experiments were conducted with a standard computer machine with Ubuntu OS and the following main characteristics: Intel Core i5 CPU (2.30 GHz) with 6 cores and 16 GB of RAM.
Software Dependencies	No	Notice that we use xgboost software in the experiments, in accordance with its Apache License 2.0. ... using our treehfd Python package... implemented in the R-package glex available online
Experiment Setup	Yes	Next, we ﬁt a boosted tree ensemble on a training dataset Dn of size n = 5000, using xgboost with M = 100 trees, and the default value for the other parameters. ... Tree HFD with interactions of second order (i.e. d I = 2)... varying tree depths (instead of the default value of 6 used in the article).