The Dangers of Post-hoc Interpretability: Unjustified Counterfactual Explanations
Authors: Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala, Xavier Renard, Marcin Detyniecki
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply this test to several datasets and classifiers and show that the risk of generating undesirable counterfactual examples is high. Additionally, we design a second test and show that state of the art post-hoc counterfactual approaches may generate unjustified explanations. The results of the Local Risk Assessment procedure are shown in Table 1. The results of the VE procedure are shown in Table 2. |
| Researcher Affiliation | Collaboration | Thibault Laugel1 , Marie-Jeanne Lesot1 , Christophe Marsala1 , Xavier Renard2 and Marcin Detyniecki1,2,3 1Sorbonne Universit e, CNRS, Laboratoire d Informatique de Paris 6, LIP6, F-75005 Paris, France 2AXA, Paris, France 3Polish Academy of Science, IBS PAN, Warsaw, Poland thibault.laugel@lip6.fr |
| Pseudocode | Yes | Algorithm 1 Local risk assessment |
| Open Source Code | Yes | The obtained results and code to reproduce them are available in an online repository (https://github.com/thibaultlaugel/truce). |
| Open Datasets | Yes | The datasets considered for these experiments include 2 low-dimensional datasets (half-moons and iris) as well as 2 real datasets: Boston Housing [Harrison and Rubinfeld, 1978] and Propublica Recidivism [Larson et al., 2016]. |
| Dataset Splits | No | The paper only mentions a 'train-test split of the data is performed with 70%-30% proportion' but does not specify a validation set or its split percentage. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory, or cloud instance types) were mentioned for the experimental setup. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., library names like PyTorch, TensorFlow, or scikit-learn with their versions) were provided. |
| Experiment Setup | Yes | For each considered dataset, a train-test split of the data is performed with 70%-30% proportion, and a binary classifier is trained. To mitigate the impact the choice of the classifier would make, we use the same classifier for every dataset, a random forest (RF) with 200 trees. However, we also train a Support Vector classifier (SVC) with Gaussian kernel on one of the Boston dataset (see below) to make sure the highlighted issue is not a characteristic feature of random forests. A counterfactual example is generated using HCLS [Lash et al., 2017] with budget B = d(x, b0). |