Washing The Unwashable : On The (Im)possibility of Fairwashing Detection
Authors: Ali Shahin Shamsabadi, Mohammad Yaghini, Natalie Dullerud, Sierra Wyllie, Ulrich Aïvodji, Aisha Alaagib, Sébastien Gambs, Nicolas Papernot
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that this divergence is significantly larger in purposefully fairwashed interpretable models than in honest ones. Furthermore, we show that our detector is robust to an informed adversary trying to bypass our detector. The code implementing FRAUD-Detect is available at https://github.com/cleverhans-lab/FRAUD-Detect. Our empirical results demonstrate that FRAUD-Detect successfully detects fairwashing based on the KL divergence between subpopulation-wise confusion matrices (Section 6). We show that the divergence over subpopulation confusion matrices can vary by over 0.6 between an honest and a fairwashed interpretable models. We assess the performance of FRAUD-Detect using a diverse set of black-box architectures, interpretable models and datasets. More precisely, we consider four architectures of black-box models: Deep Neural Networks (DNN), Ada Boost (AB) [24], Gradient Boosted Decision Trees (XGBs) [17] and Random Forests (RFs) [13]. We evaluate the approach on three real-world datasets corresponding to critical decision systems: Adult Income [22], Bank Marketing [34] and COMPAS [5]. |
| Researcher Affiliation | Academia | Ali Shahin Shamsabadi The Alan Turing Institute Vector Institute Mohammad Yaghini University of Toronto Vector Institute Natalie Dullerud University of Toronto Vector Institute Sierra Wyllie University of Toronto Vector Institute Ulrich Aïvodji ÉTS Montréal Aisha Alaagib University of Toronto Vector Institute Sébastien Gambs Université du Québec à Montréal Nicolas Papernot University of Toronto Vector Institute |
| Pseudocode | Yes | Algorithm 1 FRAUD-Detect. Input: Query access to the interpretable model I( ), suing dataset Xsg, Black-box model predictions on the suing set B(Xsg), sensitive attribute a 2 A and a threshold > 0. Output: T fairwashing is detected F fairwashing is not detected |
| Open Source Code | Yes | The code implementing FRAUD-Detect is available at https://github.com/cleverhans-lab/FRAUD-Detect. |
| Open Datasets | Yes | We evaluate the approach on three real-world datasets corresponding to critical decision systems: Adult Income [22], Bank Marketing [34] and COMPAS [5]. |
| Dataset Splits | No | The paper does not explicitly specify distinct training, validation, and test dataset splits with percentages or sample counts. It mentions a "training set (XTr, YTr, A)" for the black-box model and a "suing set Xsg" for auditing the interpretable model, but not a typical validation split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper mentions types of black-box models (Deep Neural Networks, Ada Boost, Gradient Boosted Decision Trees, Random Forests) and interpretable models (Logistic Regression, Decision Trees), but does not specify any software libraries or frameworks with version numbers (e.g., TensorFlow, PyTorch, scikit-learn versions) required for reproducibility. |
| Experiment Setup | No | The paper states that for the informed adversary optimization, they "use the logistic regression loss" for L(I; Xsg), but it does not provide concrete hyperparameter values such as learning rates, batch sizes, number of epochs, or specific optimizer settings for their experiments. It refers to general hyperparameter search or selection but lacks explicit details. |