Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Fairness Constraints: A Flexible Approach for Fair Classification
Authors: Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez-Rodriguez, Krishna P. Gummadi
JMLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on multiple synthetic and real-world datasets show that our framework is able to successfully limit unfairness, often at a small cost in terms of accuracy. Keywords: Supervised learning, margin-based classifiers, fairness, discrimination, disparate impact. |
| Researcher Affiliation | Collaboration | Muhammad Bilal Zafar EMAIL Bosch Center for Artificial Intelligence and Max Planck Institute for Software Systems Saarbr ucken, Germany Isabel Valera EMAIL Max Planck Institute for Intelligent Systems T ubingen, Germany Manuel Gomez-Rodriguez EMAIL Max Planck Institute for Software Systems Kaiserslautern, Germany Krishna P. Gummadi EMAIL Max Planck Institute for Software Systems Saarbr ucken, Germany |
| Pseudocode | Yes | Algorithm 1: Baseline method for removing disparate mistreatment w.r.t. FPR. Input: Training set D = {(xi, yi, zi)}N i=1, > 0 ϵ > 0 Output: Fair baseline decision boundary θ Initialize: Penalty C = 1 1 Train (unfair) classifier θ = argminθ P d D L(θ, d) 2 Compute ˆyi = sign(dθ(xi)) and DFP on D. |
| Open Source Code | Yes | . Open-source code implementation is available at: http://fate-computing.mpi-sws.org/. |
| Open Datasets | Yes | Here, we experiment with two real-world datasets: The Adult income dataset (Adult, 1996) and the Bank marketing dataset (Bank, 2014)... The Pro Publica COMPAS dataset consists of data about 7, 215 pretrial criminal defendants... The NYPD SQF dataset consists of 84, 868 pedestrians who were stopped in the year 2012 |
| Dataset Splits | Yes | In all the experiments, to obtain more reliable estimates of accuracy and fairness, we repeatedly split each dataset into a train (70%) and test (30%) set 5 times and report the average statistics for accuracy and fairness. |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU/GPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | Our fairness constraints can be solved using standard convex or convex-concave optimizers (e.g., SLSQP and DCCP, respectively) and we faced no scalability issues during the experiments conducted in this paper. |
| Experiment Setup | Yes | More specifically, for any convex boundary-based classifier, our framework defines an intuitive measure of decision boundary unfairness: the covariance between the sensitive features and the signed distance between the (non sensitive) feature vectors and the decision boundary of the classifier for a subset the subjects which depends on the fairness notion of interest... To overcome the unfairness, we train logistic regression classifiers with disparate impact constraints (Eq. 4.13) on both datasets. Figure 2 shows the decision boundaries provided by the classifiers for two (successively decreasing) covariance thresholds, c... For the SVM classifier, the hyperparameter C (in Eq. (4.15)) is only cross-validated for the unconstrained classifier, and the same hyperparameter is used for the fairness-constrained classifiers. |