reproducibilityindex.ai

How Far Can Fairness Constraints Help Recover From Biased Data?

Authors: Mohit Sharma, Amit Deshpande

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We propose a general approach to extend the result of Blum & Stangl (2019) to different fairness constraints, data bias models, data distributions, and hypothesis classes. We strengthen their result, and extend it to the case when their stylized distribution has labels with Massart noise instead of i.i.d. noise. We prove a similar recovery result for arbitrary data distributions using fair reject option classifiers. We further generalize it to arbitrary data distributions and arbitrary hypothesis classes, i.e., we prove that for any data distribution, if the optimally accurate classifier in a given hypothesis class is fair and robust, then it can be recovered through fair classification with equal opportunity constraints on the biased distribution whenever the bias parameters satisfy certain simple conditions.
Researcher Affiliation	Collaboration	Work done during internship at Microsoft Research India. 1Indraprastha Institute of Information Technology, Delhi, India 2Microsoft Research India. Correspondence to: Mohit Sharma <mohits@iiitd.ac.in>.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. It focuses on theoretical derivations and proofs.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. There is no mention of code release, repository links, or code in supplementary materials.
Open Datasets	No	The paper is theoretical and does not conduct experiments on specific datasets. It refers to 'stylized data distribution' or 'arbitrary data distributions' for theoretical analysis rather than for empirical evaluation with publicly accessible data.
Dataset Splits	No	The paper is theoretical and does not present empirical experiments. Therefore, it does not provide specific dataset split information for training, validation, or testing.
Hardware Specification	No	The paper is purely theoretical and does not describe any experimental setup or hardware used for computation.
Software Dependencies	No	The paper is purely theoretical and does not describe any software implementation details or dependencies with version numbers.
Experiment Setup	No	The paper is purely theoretical and does not describe any experimental setup details, concrete hyperparameter values, or training configurations.