Fair Classification with Noisy Protected Attributes: A Framework with Provable Guarantees
Authors: L. Elisa Celis, Lingxiao Huang, Vijay Keswani, Nisheeth K. Vishnoi
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that our framework can be used to attain either statistical rate or false positive rate fairness guarantees with a minimal loss in accuracy, even when the noise is large, in two real-world datasets. We implement our denoised program, for binary and nonbinary protected attributes, and compare the performance with baseline algorithms on real-world datasets. |
| Researcher Affiliation | Academia | 1Department of Statistics and Data Science, Yale University, USA 2Tsinghua University, China 3Department of Computer Science, Yale University, USA. |
| Pseudocode | No | The paper describes algorithms and programs (e.g., Program Target Fair, Program DFair) but does not provide them in a structured pseudocode or algorithm block format. |
| Open Source Code | Yes | Code available at github.com/vijaykeswani/Noisy Fair-Classification. |
| Open Datasets | Yes | We perform simulations on the Adult (Asuncion & Newman, 2007) and COMPAS (Angwin et al., 2016b) benchmark datasets, as pre-processed in AIF360 toolkit (Bellamy et al., 2018b). |
| Dataset Splits | No | We first shuffle and partition the dataset into a train and test partition (70-30 split). The paper does not explicitly mention a validation set split. |
| Hardware Specification | No | The paper describes experimental simulations and comparisons but does not provide any specific hardware details used for running the experiments. |
| Software Dependencies | No | The paper mentions the 'AIF360 toolkit' and 'SLSQP' solver, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We first shuffle and partition the dataset into a train and test partition (70-30 split). For binary protected attributes, we use η0 = 0.3 and η1 = 0.1. For non-binary protected attributes, we use the noise matrix [...] For COMPAS, we use λ=0.1 as a large fraction (47%) of training samples have class label 1, while for Adult, we use λ=0 as the fraction of positive class labels is small (24%). |