The Dangers of Post-hoc Interpretability: Unjustified Counterfactual Explanations

Authors: Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala, Xavier Renard, Marcin Detyniecki

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply this test to several datasets and classifiers and show that the risk of generating undesirable counterfactual examples is high. Additionally, we design a second test and show that state of the art post-hoc counterfactual approaches may generate unjustified explanations. The results of the Local Risk Assessment procedure are shown in Table 1. The results of the VE procedure are shown in Table 2.
Researcher Affiliation Collaboration Thibault Laugel1 , Marie-Jeanne Lesot1 , Christophe Marsala1 , Xavier Renard2 and Marcin Detyniecki1,2,3 1Sorbonne Universit e, CNRS, Laboratoire d Informatique de Paris 6, LIP6, F-75005 Paris, France 2AXA, Paris, France 3Polish Academy of Science, IBS PAN, Warsaw, Poland thibault.laugel@lip6.fr
Pseudocode Yes Algorithm 1 Local risk assessment
Open Source Code Yes The obtained results and code to reproduce them are available in an online repository (https://github.com/thibaultlaugel/truce).
Open Datasets Yes The datasets considered for these experiments include 2 low-dimensional datasets (half-moons and iris) as well as 2 real datasets: Boston Housing [Harrison and Rubinfeld, 1978] and Propublica Recidivism [Larson et al., 2016].
Dataset Splits No The paper only mentions a 'train-test split of the data is performed with 70%-30% proportion' but does not specify a validation set or its split percentage.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory, or cloud instance types) were mentioned for the experimental setup.
Software Dependencies No No specific software dependencies with version numbers (e.g., library names like PyTorch, TensorFlow, or scikit-learn with their versions) were provided.
Experiment Setup Yes For each considered dataset, a train-test split of the data is performed with 70%-30% proportion, and a binary classifier is trained. To mitigate the impact the choice of the classifier would make, we use the same classifier for every dataset, a random forest (RF) with 200 trees. However, we also train a Support Vector classifier (SVC) with Gaussian kernel on one of the Boston dataset (see below) to make sure the highlighted issue is not a characteristic feature of random forests. A counterfactual example is generated using HCLS [Lash et al., 2017] with budget B = d(x, b0).