Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

mdfa: Multi-Differential Fairness Auditor for Black Box Classifiers

Authors: Xavier Gitiaux, Huzefa Rangwala

IJCAI 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply mdfa to a recidivism risk assessment classiﬁer and demonstrate that for individuals with little criminal history, identiﬁed African-Americans are three-times more likely to be considered at high risk of violent recidivism than similar non-African-Americans. and 4 Experimental Results
Researcher Affiliation	Academia	Xavier Gitiaux and Huzefa Rangwala George Mason University EMAIL
Pseudocode	Yes	Algorithm 1 Worst Violation Algorithm (WVA) Input: {((xi, ai), yi)}m i=1, C 2\|X\|, ξ, α, weights u, y { 1, 1} 1: α0 = 1, uit = u 2: while αt > α do 3: ct = argminc C 1 i=1 uit(x)l(aiyi, c(xi)) + i=1,c(xi)=1 yi=y,ai=1 i=1,c(xi)=1 yi=y,ai= 1 i=1,c(xi)=1,yi=1 ui(xi) 6: t t + 1, uit uit + uiξ if ai = yi and yi = y. 7: end while 8: Return ln(δt).
Open Source Code	No	The paper does not provide any statement or link regarding the public availability of its source code.
Open Datasets	Yes	The data collected by Pro Publica in Broward County from 2013 to 2015 contains 7K individuals along with a risk score and a risk category assigned by COMPAS. We transform the risk category into a binary variable equal to 1 for individuals assigned in the high risk category (risk score between 8 and 10). The data provides us with information related to the historical criminal history, misdemeanors, gender, age and race of each individual. ([Pro Publica, 2016]) and The experiment is carried on three datasets from [Friedler et al., 2018; Kearns et al., 2018]): Adult with 48840 individuals; German with 1000 individuals; and, Crimes with 1994 communities.
Dataset Splits	Yes	First, we split 70%/30% the input data into a train and test set. Using a 5 fold cross-validation, mdfa is trained on four folds and a grid search looks for regularization parameters that minimize the maximum-mean-discrepancy Gk(u, s) and the empirical risk on the ﬁfth fold.
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions types of models and classifiers (e.g., neural network, support vector machine, logistic regression) but does not provide specific software names with version numbers for libraries or frameworks used.
Experiment Setup	Yes	mfda ﬁrst uses a neural network with four fully connected layers of 8 neurons to express the weights u as a function of the features x and minimizes the maximum-mean discrepancy function ˆ Gk(u, s). and mdfa is trained using a support vector machine (RBF kernel) on a unbalanced data (µ = 0.2) with value of δm varying from 0 to 3.0.