Counterfactual Explanations Can Be Manipulated

Authors: Dylan Slack, Anna Hilgard, Himabindu Lakkaraju, Sameer Singh

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform experiments on loan and violent crime prediction data sets where certain subgroups achieve up to 20x lower cost recourse under the perturbation. These results raise concerns regarding the dependability of current counterfactual explanation techniques, which we hope will inspire investigations in robust counterfactual explanations.
Researcher Affiliation Academia Dylan Slack UC Irvine dslack@uci.edu Sophie Hilgard Harvard University ash798@g.harvard.edu Himabindu Lakkaraju Harvard University hlakkaraju@hbs.edu Sameer Singh UC Irvine sameer@uci.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No 1Project Page: https://dylanslacks.website/cfe/ (This is a project page, not a direct link to a source code repository for the methodology.)
Open Datasets Yes We use two data sets: Communities and Crime and the German Credit datasets [21], as they are commonly used benchmarks in both the counterfactual explanation and fairness literature [19, 22]. Both these datasets are in the public domain. [21] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
Dataset Splits No We preprocess the data as in Slack et al. [24], and apply 0 mean, unit variance scaling to the features and perform an 80/20 split on the data to create training and testing sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions software components like 'Adam optimizer' and 'official Di CE implementation' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We use feed-forward neural networks as the adversarial model consisting of 4 layers of 200 nodes with the tanh activation function, the Adam optimizer, and using cross-entropy as the loss L. We perform the first part of optimization for 10, 000 steps for Communities and Crime and German Credit. We train the second part of the optimization for 15 steps. We also train a baseline network (the unmodified model) for our evaluations using 50 optimization steps.