mdfa: Multi-Differential Fairness Auditor for Black Box Classifiers

Authors: Xavier Gitiaux, Huzefa Rangwala

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply mdfa to a recidivism risk assessment classifier and demonstrate that for individuals with little criminal history, identified African-Americans are three-times more likely to be considered at high risk of violent recidivism than similar non-African-Americans. and 4 Experimental Results
Researcher Affiliation Academia Xavier Gitiaux and Huzefa Rangwala George Mason University {xgitiaux, hrangwal}@gmu.edu
Pseudocode Yes Algorithm 1 Worst Violation Algorithm (WVA) Input: {((xi, ai), yi)}m i=1, C 2|X|, ξ, α, weights u, y { 1, 1} 1: α0 = 1, uit = u 2: while αt > α do 3: ct = argminc C 1 i=1 uit(x)l(aiyi, c(xi)) + i=1,c(xi)=1 yi=y,ai=1 i=1,c(xi)=1 yi=y,ai= 1 i=1,c(xi)=1,yi=1 ui(xi) 6: t t + 1, uit uit + uiξ if ai = yi and yi = y. 7: end while 8: Return ln(δt).
Open Source Code No The paper does not provide any statement or link regarding the public availability of its source code.
Open Datasets Yes The data collected by Pro Publica in Broward County from 2013 to 2015 contains 7K individuals along with a risk score and a risk category assigned by COMPAS. We transform the risk category into a binary variable equal to 1 for individuals assigned in the high risk category (risk score between 8 and 10). The data provides us with information related to the historical criminal history, misdemeanors, gender, age and race of each individual. ([Pro Publica, 2016]) and The experiment is carried on three datasets from [Friedler et al., 2018; Kearns et al., 2018]): Adult with 48840 individuals; German with 1000 individuals; and, Crimes with 1994 communities.
Dataset Splits Yes First, we split 70%/30% the input data into a train and test set. Using a 5 fold cross-validation, mdfa is trained on four folds and a grid search looks for regularization parameters that minimize the maximum-mean-discrepancy Gk(u, s) and the empirical risk on the fifth fold.
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions types of models and classifiers (e.g., neural network, support vector machine, logistic regression) but does not provide specific software names with version numbers for libraries or frameworks used.
Experiment Setup Yes mfda first uses a neural network with four fully connected layers of 8 neurons to express the weights u as a function of the features x and minimizes the maximum-mean discrepancy function ˆ Gk(u, s). and mdfa is trained using a support vector machine (RBF kernel) on a unbalanced data (µ = 0.2) with value of δm varying from 0 to 3.0.