Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Does mitigating ML's impact disparity require treatment disparity?
Authors: Zachary Lipton, Julian McAuley, Alexandra Chouldechova
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on several real-world datasets highlight the practical consequences of applying DLPs. |
| Researcher Affiliation | Academia | 1Carnegie Mellon University 2University of California, San Diego |
| Pseudocode | No | The paper describes procedures in narrative form (e.g., 'Our thresholding rule for maximizing accuracy subject to a p-% rule works as follows...'), but does not include structured pseudocode blocks or algorithms. |
| Open Source Code | No | The paper states 'code and data will be released at publication time', which is a promise for future release, not concrete access. It also mentions using a third-party code: 'We apply the DLP proposed by Zafar et al. [5], using code available from the authors.2 https://github.com/mbilalzafar/fair-classification/' |
| Open Datasets | Yes | To construct the data, we sample nall = 2000 total observations from the data-generating process described below. 70% of the observations are used for training, and the remaining 30% are reserved for model testing. ... We consider a sample of 9,000 students considered for admission ... Half of the examples are withheld for testing. ... Statistics of public datasets. Income UCI [22] ... Marketing UCI [23] ... Credit UCI [24] ... Employee Attr. IBM [25] ... Customer Attr. IBM [25] |
| Dataset Splits | No | The paper specifies training and testing splits (e.g., '70% of the observations are used for training, and the remaining 30% are reserved for model testing' and 'Half of the examples are withheld for testing'), but does not explicitly mention a separate validation split or strategy. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, cloud instance types) used for running experiments. |
| Software Dependencies | No | The paper mentions applying a DLP from Zafar et al. [5] and logistic regressors, but does not provide specific software names with version numbers for reproducibility (e.g., Python, PyTorch, scikit-learn versions). |
| Experiment Setup | No | The paper describes the data-generating process and mentions training logistic regressors, but does not specify concrete hyperparameters (e.g., learning rate, batch size, number of epochs, optimizer settings) or other system-level training configurations for these models. |