Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer
Authors: David Madras, Toni Pitassi, Richard Zemel
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that learning to defer can make systems not only more accurate but also less biased. |
| Researcher Affiliation | Academia | David Madras, Toniann Pitassi & Richard Zemel University of Toronto Vector Institute EMAIL |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm', nor are there structured code-like blocks describing procedures. |
| Open Source Code | Yes | Code available at https://github.com/dmadras/predict-responsibly. |
| Open Datasets | Yes | We use two datasets: COMPAS [26], where we predict a defendant s recidivism without discriminating by race, and Heritage Health (https://www.kaggle.com/c/hhp)... |
| Dataset Splits | No | The paper mentions 'All results are on held-out test sets' and that models are trained, but does not explicitly provide details about training/validation/test splits, specific percentages, or how a validation set was used for hyperparameter tuning. |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU or GPU models, memory, or cloud computing specifications used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers, such as programming language versions or library versions (e.g., TensorFlow, PyTorch, scikit-learn versions). |
| Experiment Setup | Yes | We train all models and DMs with a fully-connected two-layer neural network... We show results across various hyperparameter settings (αfair, γdefer/γreject)... To simulate high-bias DMs (scen. 2) we train a regularized model with αfair = 0.1... To create inconsistent DMs (scen. 3), we flip a subset of the DM s predictions post-hoc with 30% probability... |