Fooling SHAP with Stealthily Biased Sampling
Authors: gabriel laberge, Ulrich Aïvodji, Satoshi Hara, Mario Marchand, Foutse Khomh
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally (Section 5), we illustrate the impact of the proposed manipulation attack on a synthetic dataset and four popular datasets, namely Adult Income, COMPAS, Marketing, and Communities. We observed that the proposed attack can reduce the importance of a sensitive feature while keeping the data manipulation undetected by the audit. |
| Researcher Affiliation | Academia | 1Polytechnique Montréal, Québec 2École de technologie supérieure, Québec 3Osaka University, Japan 4Universitié de Laval à Québec |
| Pseudocode | Yes | Algorithm 1 Compute non-uniform weights |
| Open Source Code | Yes | The source code of all our experiments is available online3. |
| Open Datasets | Yes | We consider four standard datasets from the FAcc T literature, namely COMPAS, Adult-Income, Marketing, and Communities. |
| Dataset Splits | Yes | The datasets were first divided into train/test subsets with ratio 4:5. The models were trained on the training set and evaluated on the test set. All categorical features for COMPAS, Adult, and Marketing were one-hot-encoded which resulted in a total of 11, 40, and 61 columns for each dataset respectively. A simple 50 steps random search was conducted to fine-tune the hyper-parameters with cross-validation on the training set. |
| Hardware Specification | No | No specific hardware details (like GPU models, CPU types, or memory) used for running experiments are mentioned in the paper. |
| Software Dependencies | No | The paper mentions the 'SHAP Python library' and that some parts were rewritten in 'C++', but no specific version numbers for these or other software dependencies are provided. |
| Experiment Setup | Yes | Three models were considered for the two datasets: Multi-Layered Perceptrons (MLP), Random Forests (RF), and e Xtreme Gradient Boosted trees (XGB). One model of each type was fitted on each dataset for 5 different train/test splits seeds, resulting in 60 models total. A simple 50 steps random search was conducted to fine-tune the hyper-parameters with cross-validation on the training set. |