Superhuman Fairness

Authors: Omid Memarrast, Linh Vu, Brian D Ziebart

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on standard fairness datasets (Adult and COMPAS) using accuracy as a performance measure and three conflicting fairness definitions: Demographic Parity (Calders et al., 2009), Equalized Odds (Hardt et al., 2016), and Predictive Rate Parity (Chouldechova, 2017). Though our motivation is to outperform human decisions, we employ a synthetic decision-maker with differing amounts of label and group membership noise to identify sufficient conditions for superhuman fairness of varying degrees. We find that our approach achieves high levels of superhuman performance that increase rapidly with reference decision noise and significantly outperform the superhumanness of other methods that are based on more narrow fairness-performance objectives.
Researcher Affiliation Academia Omid Memarrast 1 Linh Vu 1 Brian Ziebart 1 1Department of Computer Science, University of Illinois Chicago, Chicago, USA. Correspondence to: Omid Memarrast <omemar2@uic.edu>.
Pseudocode Yes Algorithm 1 Subdominance policy gradient optimization
Open Source Code Yes Our code is publicly available at https://github.com/ omidMemari/superhumn-fairness.
Open Datasets Yes UCI Adult dataset (Dheeru & Karra Taniskidou, 2017) considers predicting whether a household’s income exceeds $50K/yr based on census data... COMPAS dataset (Larson et al., 2016) considers predicting recidivism with group membership based on race.
Dataset Splits No No explicit mention of a 'validation' dataset split for model tuning, only train and test splits (train-all/test-all and train-demo/test-demo).
Hardware Specification No No specific hardware details (GPU/CPU models, memory, or cloud instance types) were mentioned for the experimental setup.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) were mentioned.
Experiment Setup Yes We use a logistic regression model Pθ0 with first-order moment feature functions, ϕ(y, x) = [x1y, x2y, . . . xmy] , and weights θ applied independently on each item as our decision model. ... We employ a learning rate of η = 0.01.