FRAPPÉ: A Group Fairness Framework for Post-Processing Everything

Authors: Alexandru Tifrea, Preethi Lahoti, Ben Packer, Yoni Halpern, Ahmad Beirami, Flavien Prost

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show theoretically and through extensive experiments that our framework preserves the good fairness-error trade-offs achieved with in-processing and can improve over the effectiveness of prior post-processing methods.
Researcher Affiliation Collaboration 1Department of Computer Science, ETH Zurich 2Google Deep Mind. Correspondence to: Alexandru T ifrea <alexandru.tifrea@inf.ethz.ch>, Flavien Prost <fprost@google.com>.
Pseudocode No The paper describes its methods through text and mathematical equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes 1Code is available at https://github.com/ google-research/google-research/tree/ master/postproc_fairness.
Open Datasets Yes We conduct experiments on standard datasets for assessing fairness mitigation techniques, namely Adult (Becker & Kohavi, 1996) and COMPAS (Angwin et al., 2016), as well as two recently proposed datasets: the highschool longitudinal study (HSLS) dataset (Jeong et al., 2022), and ENEM (Alghamdi et al., 2022). We also evaluate FRAPP E on data with continuous sensitive attributes (i.e. the Communities & Crime dataset (Redmond, 2009)).
Dataset Splits Yes We adopt the standard practice in the literature, and select essential hyperparameters such as the learning rate so as to minimize prediction error on a holdout validation set, for all the baselines in our experiments. We select the optimal learning rate by minimizing the prediction error on a held-out validation set.
Hardware Specification Yes The machine we used for these measurements has 32 1.5 GHz CPUs.
Software Dependencies No The paper mentions some software or toolkits like 'tensorflow-model-remediation' but does not provide specific version numbers for the software dependencies used in its implementation, such as Python or core libraries.
Experiment Setup Yes We adopt the standard practice in the literature, and select essential hyperparameters such as the learning rate so as to minimize prediction error on a holdout validation set, for all the baselines in our experiments. We use a 1-MLP with 64 hidden units to model the post-processing transformation. The optimal learning rate and early-stopping epoch are selected so as to minimize prediction error on a held-out validation set.