A Reduction to Binary Approach for Debiasing Multiclass Datasets

Authors: Ibrahim M. Alabdulmohsin, Jessica Schrouff, Sanmi Koyejo

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove that R2B satisfies optimality and bias guarantees and demonstrate empirically that it can lead to an improvement over two baselines: (1) treating multiclass problems as multi-label by debiasing labels independently and (2) transforming the features instead of the labels. Surprisingly, we also demonstrate that independent label debiasing yields competitive results in most (but not all) settings. We validate these conclusions on synthetic and real-world datasets from social science, computer vision, and healthcare.
Researcher Affiliation Industry Ibrahim Alabdulmohsin Google Research Zürich, Switzerland ibomohsin@google.com Jessica Schrouff Google Research London, United Kingdom schrouff@google.com Oluwasanmi Koyejo Google Research Mountain View, United States sanmik@google.com
Pseudocode Yes Algorithm 1: Pseudocode of the reduction-to-binary (R2B) algorithm for debiasing multiclass datasets with categorical sensitive attributes of arbitrary cardinality.
Open Source Code Yes Source code is publicly available at: https://github.com/google-research/google-research/tree/master/ml_debiaser
Open Datasets Yes Adult Income dataset (Kohavi, 1996).", "COCO dataset (Lin et al., 2014).", "The dataset is a subset of the one used in Liu et al. (2020)
Dataset Splits Yes Except for the healthcare dataset whose splits are fixed, we split data at random into 25% for test and 75% for training.", "It is split according to condition prevalence for training (n = 12,024), tuning for hyper-parameters (n = 1,925) and hold-out testing (n = 1,924).
Hardware Specification Yes Experiments involving neural networks are executed on Tensor Processing Units.
Software Dependencies No The paper mentions software like Scikit-Learn and TensorFlow but does not provide specific version numbers for them as dependencies needed to replicate the experiment.
Experiment Setup Yes In R2B, on the other hand, we use a step size of τ = 0.5 in Algorithm 1 and a maximum number of 100 ADMM rounds. We also report results when a single round of R2B is used, which we denoted as R2B0. Except for the healthcare dataset whose splits are fixed, we split data at random into 25% for test and 75% for training. We debias the dataset prior to training and measure performance (e.g. accuracy and DP) on the test split. All methods use the same splits. We re-run every experiment with ten different random seeds and report both the averages and 99% confidence intervals.