Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Fairness Constraints: A Flexible Approach for Fair Classification

Authors: Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez-Rodriguez, Krishna P. Gummadi

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on multiple synthetic and real-world datasets show that our framework is able to successfully limit unfairness, often at a small cost in terms of accuracy. Keywords: Supervised learning, margin-based classiﬁers, fairness, discrimination, disparate impact.
Researcher Affiliation	Collaboration	Muhammad Bilal Zafar EMAIL Bosch Center for Artiﬁcial Intelligence and Max Planck Institute for Software Systems Saarbr ucken, Germany Isabel Valera EMAIL Max Planck Institute for Intelligent Systems T ubingen, Germany Manuel Gomez-Rodriguez EMAIL Max Planck Institute for Software Systems Kaiserslautern, Germany Krishna P. Gummadi EMAIL Max Planck Institute for Software Systems Saarbr ucken, Germany
Pseudocode	Yes	Algorithm 1: Baseline method for removing disparate mistreatment w.r.t. FPR. Input: Training set D = {(xi, yi, zi)}N i=1, > 0 ϵ > 0 Output: Fair baseline decision boundary θ Initialize: Penalty C = 1 1 Train (unfair) classiﬁer θ = argminθ P d D L(θ, d) 2 Compute ˆyi = sign(dθ(xi)) and DFP on D.
Open Source Code	Yes	. Open-source code implementation is available at: http://fate-computing.mpi-sws.org/.
Open Datasets	Yes	Here, we experiment with two real-world datasets: The Adult income dataset (Adult, 1996) and the Bank marketing dataset (Bank, 2014)... The Pro Publica COMPAS dataset consists of data about 7, 215 pretrial criminal defendants... The NYPD SQF dataset consists of 84, 868 pedestrians who were stopped in the year 2012
Dataset Splits	Yes	In all the experiments, to obtain more reliable estimates of accuracy and fairness, we repeatedly split each dataset into a train (70%) and test (30%) set 5 times and report the average statistics for accuracy and fairness.
Hardware Specification	No	The paper does not provide specific hardware details such as CPU/GPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	Our fairness constraints can be solved using standard convex or convex-concave optimizers (e.g., SLSQP and DCCP, respectively) and we faced no scalability issues during the experiments conducted in this paper.
Experiment Setup	Yes	More speciﬁcally, for any convex boundary-based classiﬁer, our framework deﬁnes an intuitive measure of decision boundary unfairness: the covariance between the sensitive features and the signed distance between the (non sensitive) feature vectors and the decision boundary of the classiﬁer for a subset the subjects which depends on the fairness notion of interest... To overcome the unfairness, we train logistic regression classiﬁers with disparate impact constraints (Eq. 4.13) on both datasets. Figure 2 shows the decision boundaries provided by the classiﬁers for two (successively decreasing) covariance thresholds, c... For the SVM classiﬁer, the hyperparameter C (in Eq. (4.15)) is only cross-validated for the unconstrained classiﬁer, and the same hyperparameter is used for the fairness-constrained classiﬁers.