Fair Classification with Noisy Protected Attributes: A Framework with Provable Guarantees

Authors: L. Elisa Celis, Lingxiao Huang, Vijay Keswani, Nisheeth K. Vishnoi

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show that our framework can be used to attain either statistical rate or false positive rate fairness guarantees with a minimal loss in accuracy, even when the noise is large, in two real-world datasets. We implement our denoised program, for binary and nonbinary protected attributes, and compare the performance with baseline algorithms on real-world datasets.
Researcher Affiliation Academia 1Department of Statistics and Data Science, Yale University, USA 2Tsinghua University, China 3Department of Computer Science, Yale University, USA.
Pseudocode No The paper describes algorithms and programs (e.g., Program Target Fair, Program DFair) but does not provide them in a structured pseudocode or algorithm block format.
Open Source Code Yes Code available at github.com/vijaykeswani/Noisy Fair-Classification.
Open Datasets Yes We perform simulations on the Adult (Asuncion & Newman, 2007) and COMPAS (Angwin et al., 2016b) benchmark datasets, as pre-processed in AIF360 toolkit (Bellamy et al., 2018b).
Dataset Splits No We first shuffle and partition the dataset into a train and test partition (70-30 split). The paper does not explicitly mention a validation set split.
Hardware Specification No The paper describes experimental simulations and comparisons but does not provide any specific hardware details used for running the experiments.
Software Dependencies No The paper mentions the 'AIF360 toolkit' and 'SLSQP' solver, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We first shuffle and partition the dataset into a train and test partition (70-30 split). For binary protected attributes, we use η0 = 0.3 and η1 = 0.1. For non-binary protected attributes, we use the noise matrix [...] For COMPAS, we use λ=0.1 as a large fraction (47%) of training samples have class label 1, while for Adult, we use λ=0 as the fraction of positive class labels is small (24%).