Unprocessing Seven Years of Algorithmic Fairness
Authors: André Cruz, Moritz Hardt
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate these claims through thousands of model evaluations on several tabular datasets. In doing so, we address two common methodological errors that have confounded previous observations. One relates to the comparison of methods with different unconstrained base models. The other concerns methods achieving different levels of constraint relaxation. At the heart of our study is a simple idea we call unprocessing that roughly corresponds to the inverse of postprocessing. Unprocessing allows for a direct comparison of methods using different underlying models and levels of relaxation. |
| Researcher Affiliation | Academia | André F. Cruz & Moritz Hardt Max Planck Institute for Intelligent Systems, Tübingen and Tübingen AI Center |
| Pseudocode | No | The paper describes algorithms and formulations (e.g., linear programming formulation for relaxed error rate parity) using mathematical equations and descriptive text, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | We contribute a linear programming formulation to achieve approximate error rate parity for postprocessing, and open-source our implementation in an easy-to-use Python package called error-parity.1 |
| Open Datasets | Yes | We evaluate all methods on five large public benchmark datasets from the folktables Python package (Ding et al., 2021). These datasets are derived from the American Community Survey (ACS) public use microdata sample from 2018, containing a variety of demographic features (e.g., age, race, education). We also conduct a similar experiment on the MEPS dataset, shown in Appendix A.5. |
| Dataset Splits | Yes | We conduct the following procedure for each dataset, with a 60%/20%/20% train/test/validation data split. |
| Hardware Specification | Yes | Each job was given the same computing resources: 1 CPU. Compute nodes use AMD EPYC 7662 64-core CPUs. No GPUs were used. |
| Software Dependencies | No | The paper mentions using specific Python libraries like 'aif360' and 'fairlearn' and their own 'error-parity' package, but it does not specify exact version numbers for these or any other software components. |
| Experiment Setup | Yes | Each model is trained with a different randomly-sampled selection of hyperparameters (e.g., learning rate of a GBM, number of trees of an RF, weight regularization of an LR). ... Detailed hyperparameter search spaces for each algorithm are included in folder hyperparameters_spaces of the supplementary materials. |