Washing The Unwashable : On The (Im)possibility of Fairwashing Detection

Authors: Ali Shahin Shamsabadi, Mohammad Yaghini, Natalie Dullerud, Sierra Wyllie, Ulrich Aïvodji, Aisha Alaagib, Sébastien Gambs, Nicolas Papernot

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that this divergence is significantly larger in purposefully fairwashed interpretable models than in honest ones. Furthermore, we show that our detector is robust to an informed adversary trying to bypass our detector. The code implementing FRAUD-Detect is available at https://github.com/cleverhans-lab/FRAUD-Detect. Our empirical results demonstrate that FRAUD-Detect successfully detects fairwashing based on the KL divergence between subpopulation-wise confusion matrices (Section 6). We show that the divergence over subpopulation confusion matrices can vary by over 0.6 between an honest and a fairwashed interpretable models. We assess the performance of FRAUD-Detect using a diverse set of black-box architectures, interpretable models and datasets. More precisely, we consider four architectures of black-box models: Deep Neural Networks (DNN), Ada Boost (AB) [24], Gradient Boosted Decision Trees (XGBs) [17] and Random Forests (RFs) [13]. We evaluate the approach on three real-world datasets corresponding to critical decision systems: Adult Income [22], Bank Marketing [34] and COMPAS [5].
Researcher Affiliation Academia Ali Shahin Shamsabadi The Alan Turing Institute Vector Institute Mohammad Yaghini University of Toronto Vector Institute Natalie Dullerud University of Toronto Vector Institute Sierra Wyllie University of Toronto Vector Institute Ulrich Aïvodji ÉTS Montréal Aisha Alaagib University of Toronto Vector Institute Sébastien Gambs Université du Québec à Montréal Nicolas Papernot University of Toronto Vector Institute
Pseudocode Yes Algorithm 1 FRAUD-Detect. Input: Query access to the interpretable model I( ), suing dataset Xsg, Black-box model predictions on the suing set B(Xsg), sensitive attribute a 2 A and a threshold > 0. Output: T fairwashing is detected F fairwashing is not detected
Open Source Code Yes The code implementing FRAUD-Detect is available at https://github.com/cleverhans-lab/FRAUD-Detect.
Open Datasets Yes We evaluate the approach on three real-world datasets corresponding to critical decision systems: Adult Income [22], Bank Marketing [34] and COMPAS [5].
Dataset Splits No The paper does not explicitly specify distinct training, validation, and test dataset splits with percentages or sample counts. It mentions a "training set (XTr, YTr, A)" for the black-box model and a "suing set Xsg" for auditing the interpretable model, but not a typical validation split.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper mentions types of black-box models (Deep Neural Networks, Ada Boost, Gradient Boosted Decision Trees, Random Forests) and interpretable models (Logistic Regression, Decision Trees), but does not specify any software libraries or frameworks with version numbers (e.g., TensorFlow, PyTorch, scikit-learn versions) required for reproducibility.
Experiment Setup No The paper states that for the informed adversary optimization, they "use the logistic regression loss" for L(I; Xsg), but it does not provide concrete hyperparameter values such as learning rates, batch sizes, number of epochs, or specific optimizer settings for their experiments. It refers to general hyperparameter search or selection but lacks explicit details.