reproducibilityindex.ai

Post-hoc bias scoring is optimal for fair classification

Authors: Wenlong Chen, Yegor Klochkov, Yang Liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We achieve competitive or better performance compared to both in-processing and post-processing methods across three datasets: Adult, COMPAS, and Celeb A. 4 EXPERIMENTS We evaluate MBS on real-world binary classification tasks with the following experimental set-up.
Researcher Affiliation	Collaboration	Wenlong Chen Imperial College London wenlong.chen21@imperial.ac.uk Yegor Klochkov Byte Dance Research yegor.klochkov@bytedance.com Yang Liu Byte Dance Research yang.liu01@bytedance.com
Pseudocode	Yes	See detailed description in Algorithm 1 in the appendix, Section B.1. A formal algorithm is summarized in Algorithm 2 in the appendix, Section B.1. See detailed procedure in Algorithm 3, Section B.1 in the appendix.
Open Source Code	Yes	Details for the experimental set-up are provided in the beginning of Section 4, and the code can be found at https://github.com/chenw20/Bias Score.
Open Datasets	Yes	Adult Census (Kohavi, 1996), a UCI tabular dataset where the task is to predict whether the annual income of an individual is above $50,000. COMPAS (Angwin et al., 2015), a tabular dataset where the task is to predict the recidivism of criminals. Celeb A (Liu et al., 2015), a facial image dataset containing 200k instances each with 40 binary attribute annotations.
Dataset Splits	Yes	We randomly split the dataset into a training, validation and test set with 30000, 5000 and 10222 instances respectively. The dataset is randomly split into a training, validation and test set with 3166, 1056 and 1056 instances respectively.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using a 'Res Net-18' and general ML concepts but does not list specific software dependencies with version numbers (e.g., 'Python 3.x, PyTorch 1.x, CUDA 11.x').
Experiment Setup	Yes	Network architectures and hyperparameters. We use an MLP for Adult Census and COMPAS datasets, with hidden dimension chosen to be 8 and 16 respectively. For each Celeb A experiment, we use a Res Net-18 (He et al., 2016). Specifically, for Celeb A... we train multiple sensitive classifiers with different weight decay λ increasing from 0.001 to 0.1.