reproducibilityindex.ai

Superhuman Fairness

Authors: Omid Memarrast, Linh Vu, Brian D Ziebart

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on standard fairness datasets (Adult and COMPAS) using accuracy as a performance measure and three conflicting fairness definitions: Demographic Parity (Calders et al., 2009), Equalized Odds (Hardt et al., 2016), and Predictive Rate Parity (Chouldechova, 2017). Though our motivation is to outperform human decisions, we employ a synthetic decision-maker with differing amounts of label and group membership noise to identify sufficient conditions for superhuman fairness of varying degrees. We find that our approach achieves high levels of superhuman performance that increase rapidly with reference decision noise and significantly outperform the superhumanness of other methods that are based on more narrow fairness-performance objectives.
Researcher Affiliation	Academia	Omid Memarrast 1 Linh Vu 1 Brian Ziebart 1 1Department of Computer Science, University of Illinois Chicago, Chicago, USA. Correspondence to: Omid Memarrast <omemar2@uic.edu>.
Pseudocode	Yes	Algorithm 1 Subdominance policy gradient optimization
Open Source Code	Yes	Our code is publicly available at https://github.com/ omidMemari/superhumn-fairness.
Open Datasets	Yes	UCI Adult dataset (Dheeru & Karra Taniskidou, 2017) considers predicting whether a household’s income exceeds $50K/yr based on census data... COMPAS dataset (Larson et al., 2016) considers predicting recidivism with group membership based on race.
Dataset Splits	No	No explicit mention of a 'validation' dataset split for model tuning, only train and test splits (train-all/test-all and train-demo/test-demo).
Hardware Specification	No	No specific hardware details (GPU/CPU models, memory, or cloud instance types) were mentioned for the experimental setup.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) were mentioned.
Experiment Setup	Yes	We use a logistic regression model Pθ0 with first-order moment feature functions, ϕ(y, x) = [x1y, x2y, . . . xmy] , and weights θ applied independently on each item as our decision model. ... We employ a learning rate of η = 0.01.