Unfooling Perturbation-Based Post Hoc Explainers

Authors: Zachariah Carmichael, Walter J. Scheirer

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that our approach successfully detects whether a black box system adversarially conceals its decision-making process and mitigates the adversarial attack on real-world data for the prevalent explainers, LIME and SHAP.
Researcher Affiliation Academia University of Notre Dame zcarmich@nd.edu, walter.scheirer@nd.edu
Pseudocode Yes Algorithms 1 (KNN-CAD.fit) and 2 (KNN-CAD.score samples) formalize this process. [...] In Algorithm 3, the procedure for adversarial attack detection, CAD-Detect, is detailed. [...] The algorithm for defending against adversarial attacks, CAD-Defend, is detailed in Algorithm 4.
Open Source Code Yes The code for this work is available at https://github. com/craymichael/unfooling.
Open Datasets Yes We consider three real-world high-stakes data sets to evaluate our approach: The Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) dataset was collected by Pro Publica in 2016 for defendants from Broward County, Florida (Angwin et al. 2016). The German Credit data set, donated to the University of California Irvine (UCI) machine learning repository in 1994, comprises a set of attributes for German individuals and the corresponding lender risk (Dua and Graff 2017). The Communities and Crime data set combines socioeconomic US census data (1990), US Law Enforcement Management and Administrative Statistics (LEMAS) survey data (1990), and US FBI Uniform Crime Reporting (UCR) data (1995) (Redmond and Baveja 2002).
Dataset Splits Yes For each data set, a sample size of N = 10; 000 was randomly sampled without replacement. The sample was then partitioned into 70% training and 30% testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies Yes All code was written in Python 3.9 using numpy (Harris et al. 2020), scikit-learn (Pedregosa et al. 2011), pandas (McKinney 2010), and SciPy (Virtanen et al. 2020).
Experiment Setup Yes For each data set, a sample size of N = 10; 000 was randomly sampled without replacement. The sample was then partitioned into 70% training and 30% testing. [...] For LIME and SHAP, the number of perturbations was np = 1; 000 and np = 100, respectively. [...] The random seeds for all libraries were set to 42.