Optimal Counterfactual Explanations in Tree Ensembles
Authors: Axel Parmentier, Thibaut Vidal
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental analyses demonstrate that the proposed search approach requires a computational effort that is orders of magnitude smaller than previous mathematical programming algorithms. We conduct an extensive and reproducible experimental campaign, which can be executed from a single self-contained Python script. |
| Researcher Affiliation | Academia | 1CERMICS, Ecole des Ponts Paristech; 2CIRRELT & SCALEAI Chair in Data-Driven Supply Chains, Department of Mathematics and Industrial Engineering, Polytechnique Montreal, Canada; 3Department of Computer Science, Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Brazil. |
| Pseudocode | No | The paper describes mathematical formulations and processes but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our source code is openly accessible at https://github.com/ vidalt/OCEAN under a MIT license. |
| Open Datasets | Yes | We conduct our experiments on eight data sets representative of diverse applications such as loan approval, socioeconomical studies, pretrial bail, news performance prediction, and malware detection. Table 2 reports their number of samples (n), number of features (total = p, numerical = p N, binary = p B, and categorical = p C), and source of origin. Table 2 lists datasets like AD: Adult (UCI), CC: Credit Card Default (UCI), CP: COMPAS (Pro Publica), GC: German Credit (UCI), ON: Online News (UCI), PH: Data Phishing (UCI), SP: Spambase (UCI), ST: Students Performance (UCI). |
| Dataset Splits | Yes | Each data set has been randomly split into 80% training and 20% test set. |
| Hardware Specification | Yes | All experiments have been run on four threads of an Intel Core i9-9880H 2.30GHz CPU with 64GB of available RAM, running Ubuntu 20.04.1 LTS. |
| Software Dependencies | Yes | We use scikit-learn v0.23.0 for training random forests and Gurobi 9.1 (via gurobipy) for solving the mathematical models. |
| Experiment Setup | Yes | For each data set, we generated a single random forest with 100 trees limited at depth 5. We selected 20 different negative samples from the test set to serve as origin points for the counterfactual explanations. To standardize the analyses between different data sets, we opted to set actionability constraints on two columns wherever applicable: age is constrained to be non-decreasing, and sex always stays fixed. To simulate differences of actionability among features, the marginal weights c i and c+ i of each feature in the objective have been independently drawn in the uniform distribution U(0.5, 2). |