Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Fooling SHAP with Stealthily Biased Sampling
Authors: gabriel laberge, Ulrich Aïvodji, Satoshi Hara, Mario Marchand, Foutse Khomh
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally (Section 5), we illustrate the impact of the proposed manipulation attack on a synthetic dataset and four popular datasets, namely Adult Income, COMPAS, Marketing, and Communities. We observed that the proposed attack can reduce the importance of a sensitive feature while keeping the data manipulation undetected by the audit. |
| Researcher Affiliation | Academia | 1Polytechnique Montréal, Québec 2École de technologie supérieure, Québec 3Osaka University, Japan 4Universitié de Laval à Québec |
| Pseudocode | Yes | Algorithm 1 Compute non-uniform weights |
| Open Source Code | Yes | The source code of all our experiments is available online3. |
| Open Datasets | Yes | We consider four standard datasets from the FAcc T literature, namely COMPAS, Adult-Income, Marketing, and Communities. |
| Dataset Splits | Yes | The datasets were first divided into train/test subsets with ratio 4:5. The models were trained on the training set and evaluated on the test set. All categorical features for COMPAS, Adult, and Marketing were one-hot-encoded which resulted in a total of 11, 40, and 61 columns for each dataset respectively. A simple 50 steps random search was conducted to fine-tune the hyper-parameters with cross-validation on the training set. |
| Hardware Specification | No | No specific hardware details (like GPU models, CPU types, or memory) used for running experiments are mentioned in the paper. |
| Software Dependencies | No | The paper mentions the 'SHAP Python library' and that some parts were rewritten in 'C++', but no specific version numbers for these or other software dependencies are provided. |
| Experiment Setup | Yes | Three models were considered for the two datasets: Multi-Layered Perceptrons (MLP), Random Forests (RF), and e Xtreme Gradient Boosted trees (XGB). One model of each type was fitted on each dataset for 5 different train/test splits seeds, resulting in 60 models total. A simple 50 steps random search was conducted to fine-tune the hyper-parameters with cross-validation on the training set. |