reproducibilityindex.ai

Optimal Counterfactual Explanations in Tree Ensembles

Authors: Axel Parmentier, Thibaut Vidal

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental analyses demonstrate that the proposed search approach requires a computational effort that is orders of magnitude smaller than previous mathematical programming algorithms. We conduct an extensive and reproducible experimental campaign, which can be executed from a single self-contained Python script.
Researcher Affiliation	Academia	1CERMICS, Ecole des Ponts Paristech; 2CIRRELT & SCALEAI Chair in Data-Driven Supply Chains, Department of Mathematics and Industrial Engineering, Polytechnique Montreal, Canada; 3Department of Computer Science, Pontiﬁcal Catholic University of Rio de Janeiro (PUC-Rio), Brazil.
Pseudocode	No	The paper describes mathematical formulations and processes but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our source code is openly accessible at https://github.com/ vidalt/OCEAN under a MIT license.
Open Datasets	Yes	We conduct our experiments on eight data sets representative of diverse applications such as loan approval, socioeconomical studies, pretrial bail, news performance prediction, and malware detection. Table 2 reports their number of samples (n), number of features (total = p, numerical = p N, binary = p B, and categorical = p C), and source of origin. Table 2 lists datasets like AD: Adult (UCI), CC: Credit Card Default (UCI), CP: COMPAS (Pro Publica), GC: German Credit (UCI), ON: Online News (UCI), PH: Data Phishing (UCI), SP: Spambase (UCI), ST: Students Performance (UCI).
Dataset Splits	Yes	Each data set has been randomly split into 80% training and 20% test set.
Hardware Specification	Yes	All experiments have been run on four threads of an Intel Core i9-9880H 2.30GHz CPU with 64GB of available RAM, running Ubuntu 20.04.1 LTS.
Software Dependencies	Yes	We use scikit-learn v0.23.0 for training random forests and Gurobi 9.1 (via gurobipy) for solving the mathematical models.
Experiment Setup	Yes	For each data set, we generated a single random forest with 100 trees limited at depth 5. We selected 20 different negative samples from the test set to serve as origin points for the counterfactual explanations. To standardize the analyses between different data sets, we opted to set actionability constraints on two columns wherever applicable: age is constrained to be non-decreasing, and sex always stays ﬁxed. To simulate differences of actionability among features, the marginal weights c i and c+ i of each feature in the objective have been independently drawn in the uniform distribution U(0.5, 2).