reproducibilityindex.ai

Unprocessing Seven Years of Algorithmic Fairness

Authors: André Cruz, Moritz Hardt

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate these claims through thousands of model evaluations on several tabular datasets. In doing so, we address two common methodological errors that have confounded previous observations. One relates to the comparison of methods with different unconstrained base models. The other concerns methods achieving different levels of constraint relaxation. At the heart of our study is a simple idea we call unprocessing that roughly corresponds to the inverse of postprocessing. Unprocessing allows for a direct comparison of methods using different underlying models and levels of relaxation.
Researcher Affiliation	Academia	André F. Cruz & Moritz Hardt Max Planck Institute for Intelligent Systems, Tübingen and Tübingen AI Center
Pseudocode	No	The paper describes algorithms and formulations (e.g., linear programming formulation for relaxed error rate parity) using mathematical equations and descriptive text, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	We contribute a linear programming formulation to achieve approximate error rate parity for postprocessing, and open-source our implementation in an easy-to-use Python package called error-parity.1
Open Datasets	Yes	We evaluate all methods on five large public benchmark datasets from the folktables Python package (Ding et al., 2021). These datasets are derived from the American Community Survey (ACS) public use microdata sample from 2018, containing a variety of demographic features (e.g., age, race, education). We also conduct a similar experiment on the MEPS dataset, shown in Appendix A.5.
Dataset Splits	Yes	We conduct the following procedure for each dataset, with a 60%/20%/20% train/test/validation data split.
Hardware Specification	Yes	Each job was given the same computing resources: 1 CPU. Compute nodes use AMD EPYC 7662 64-core CPUs. No GPUs were used.
Software Dependencies	No	The paper mentions using specific Python libraries like 'aif360' and 'fairlearn' and their own 'error-parity' package, but it does not specify exact version numbers for these or any other software components.
Experiment Setup	Yes	Each model is trained with a different randomly-sampled selection of hyperparameters (e.g., learning rate of a GBM, number of trees of an RF, weight regularization of an LR). ... Detailed hyperparameter search spaces for each algorithm are included in folder hyperparameters_spaces of the supplementary materials.