reproducibilityindex.ai

A Reduction to Binary Approach for Debiasing Multiclass Datasets

Authors: Ibrahim M. Alabdulmohsin, Jessica Schrouff, Sanmi Koyejo

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We prove that R2B satisﬁes optimality and bias guarantees and demonstrate empirically that it can lead to an improvement over two baselines: (1) treating multiclass problems as multi-label by debiasing labels independently and (2) transforming the features instead of the labels. Surprisingly, we also demonstrate that independent label debiasing yields competitive results in most (but not all) settings. We validate these conclusions on synthetic and real-world datasets from social science, computer vision, and healthcare.
Researcher Affiliation	Industry	Ibrahim Alabdulmohsin Google Research Zürich, Switzerland ibomohsin@google.com Jessica Schrouff Google Research London, United Kingdom schrouff@google.com Oluwasanmi Koyejo Google Research Mountain View, United States sanmik@google.com
Pseudocode	Yes	Algorithm 1: Pseudocode of the reduction-to-binary (R2B) algorithm for debiasing multiclass datasets with categorical sensitive attributes of arbitrary cardinality.
Open Source Code	Yes	Source code is publicly available at: https://github.com/google-research/google-research/tree/master/ml_debiaser
Open Datasets	Yes	Adult Income dataset (Kohavi, 1996).", "COCO dataset (Lin et al., 2014).", "The dataset is a subset of the one used in Liu et al. (2020)
Dataset Splits	Yes	Except for the healthcare dataset whose splits are ﬁxed, we split data at random into 25% for test and 75% for training.", "It is split according to condition prevalence for training (n = 12,024), tuning for hyper-parameters (n = 1,925) and hold-out testing (n = 1,924).
Hardware Specification	Yes	Experiments involving neural networks are executed on Tensor Processing Units.
Software Dependencies	No	The paper mentions software like Scikit-Learn and TensorFlow but does not provide specific version numbers for them as dependencies needed to replicate the experiment.
Experiment Setup	Yes	In R2B, on the other hand, we use a step size of τ = 0.5 in Algorithm 1 and a maximum number of 100 ADMM rounds. We also report results when a single round of R2B is used, which we denoted as R2B0. Except for the healthcare dataset whose splits are ﬁxed, we split data at random into 25% for test and 75% for training. We debias the dataset prior to training and measure performance (e.g. accuracy and DP) on the test split. All methods use the same splits. We re-run every experiment with ten different random seeds and report both the averages and 99% conﬁdence intervals.