Unfooling Perturbation-Based Post Hoc Explainers
Authors: Zachariah Carmichael, Walter J. Scheirer
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our approach successfully detects whether a black box system adversarially conceals its decision-making process and mitigates the adversarial attack on real-world data for the prevalent explainers, LIME and SHAP. |
| Researcher Affiliation | Academia | University of Notre Dame zcarmich@nd.edu, walter.scheirer@nd.edu |
| Pseudocode | Yes | Algorithms 1 (KNN-CAD.fit) and 2 (KNN-CAD.score samples) formalize this process. [...] In Algorithm 3, the procedure for adversarial attack detection, CAD-Detect, is detailed. [...] The algorithm for defending against adversarial attacks, CAD-Defend, is detailed in Algorithm 4. |
| Open Source Code | Yes | The code for this work is available at https://github. com/craymichael/unfooling. |
| Open Datasets | Yes | We consider three real-world high-stakes data sets to evaluate our approach: The Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) dataset was collected by Pro Publica in 2016 for defendants from Broward County, Florida (Angwin et al. 2016). The German Credit data set, donated to the University of California Irvine (UCI) machine learning repository in 1994, comprises a set of attributes for German individuals and the corresponding lender risk (Dua and Graff 2017). The Communities and Crime data set combines socioeconomic US census data (1990), US Law Enforcement Management and Administrative Statistics (LEMAS) survey data (1990), and US FBI Uniform Crime Reporting (UCR) data (1995) (Redmond and Baveja 2002). |
| Dataset Splits | Yes | For each data set, a sample size of N = 10; 000 was randomly sampled without replacement. The sample was then partitioned into 70% training and 30% testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments. |
| Software Dependencies | Yes | All code was written in Python 3.9 using numpy (Harris et al. 2020), scikit-learn (Pedregosa et al. 2011), pandas (McKinney 2010), and SciPy (Virtanen et al. 2020). |
| Experiment Setup | Yes | For each data set, a sample size of N = 10; 000 was randomly sampled without replacement. The sample was then partitioned into 70% training and 30% testing. [...] For LIME and SHAP, the number of perturbations was np = 1; 000 and np = 100, respectively. [...] The random seeds for all libraries were set to 42. |