Post-hoc bias scoring is optimal for fair classification
Authors: Wenlong Chen, Yegor Klochkov, Yang Liu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We achieve competitive or better performance compared to both in-processing and post-processing methods across three datasets: Adult, COMPAS, and Celeb A. 4 EXPERIMENTS We evaluate MBS on real-world binary classification tasks with the following experimental set-up. |
| Researcher Affiliation | Collaboration | Wenlong Chen Imperial College London wenlong.chen21@imperial.ac.uk Yegor Klochkov Byte Dance Research yegor.klochkov@bytedance.com Yang Liu Byte Dance Research yang.liu01@bytedance.com |
| Pseudocode | Yes | See detailed description in Algorithm 1 in the appendix, Section B.1. A formal algorithm is summarized in Algorithm 2 in the appendix, Section B.1. See detailed procedure in Algorithm 3, Section B.1 in the appendix. |
| Open Source Code | Yes | Details for the experimental set-up are provided in the beginning of Section 4, and the code can be found at https://github.com/chenw20/Bias Score. |
| Open Datasets | Yes | Adult Census (Kohavi, 1996), a UCI tabular dataset where the task is to predict whether the annual income of an individual is above $50,000. COMPAS (Angwin et al., 2015), a tabular dataset where the task is to predict the recidivism of criminals. Celeb A (Liu et al., 2015), a facial image dataset containing 200k instances each with 40 binary attribute annotations. |
| Dataset Splits | Yes | We randomly split the dataset into a training, validation and test set with 30000, 5000 and 10222 instances respectively. The dataset is randomly split into a training, validation and test set with 3166, 1056 and 1056 instances respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using a 'Res Net-18' and general ML concepts but does not list specific software dependencies with version numbers (e.g., 'Python 3.x, PyTorch 1.x, CUDA 11.x'). |
| Experiment Setup | Yes | Network architectures and hyperparameters. We use an MLP for Adult Census and COMPAS datasets, with hidden dimension chosen to be 8 and 16 respectively. For each Celeb A experiment, we use a Res Net-18 (He et al., 2016). Specifically, for Celeb A... we train multiple sensitive classifiers with different weight decay λ increasing from 0.001 to 0.1. |