Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Superhuman Fairness
Authors: Omid Memarrast, Linh Vu, Brian D Ziebart
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on standard fairness datasets (Adult and COMPAS) using accuracy as a performance measure and three conflicting fairness definitions: Demographic Parity (Calders et al., 2009), Equalized Odds (Hardt et al., 2016), and Predictive Rate Parity (Chouldechova, 2017). Though our motivation is to outperform human decisions, we employ a synthetic decision-maker with differing amounts of label and group membership noise to identify sufficient conditions for superhuman fairness of varying degrees. We find that our approach achieves high levels of superhuman performance that increase rapidly with reference decision noise and significantly outperform the superhumanness of other methods that are based on more narrow fairness-performance objectives. |
| Researcher Affiliation | Academia | Omid Memarrast 1 Linh Vu 1 Brian Ziebart 1 1Department of Computer Science, University of Illinois Chicago, Chicago, USA. Correspondence to: Omid Memarrast <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Subdominance policy gradient optimization |
| Open Source Code | Yes | Our code is publicly available at https://github.com/ omidMemari/superhumn-fairness. |
| Open Datasets | Yes | UCI Adult dataset (Dheeru & Karra Taniskidou, 2017) considers predicting whether a household’s income exceeds $50K/yr based on census data... COMPAS dataset (Larson et al., 2016) considers predicting recidivism with group membership based on race. |
| Dataset Splits | No | No explicit mention of a 'validation' dataset split for model tuning, only train and test splits (train-all/test-all and train-demo/test-demo). |
| Hardware Specification | No | No specific hardware details (GPU/CPU models, memory, or cloud instance types) were mentioned for the experimental setup. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) were mentioned. |
| Experiment Setup | Yes | We use a logistic regression model Pθ0 with first-order moment feature functions, ϕ(y, x) = [x1y, x2y, . . . xmy] , and weights θ applied independently on each item as our decision model. ... We employ a learning rate of η = 0.01. |