Controllable Guarantees for Fair Outcomes via Contrastive Information Estimation

Authors: Umang Gupta, Aaron M Ferber, Bistra Dilkina, Greg Ver Steeg7610-7619

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our approach on UCI Adult and Heritage Health datasets and demonstrate that our approach provides more informative representations across a range of desired parity thresholds while providing strong theoretical guarantees on the parity of any downstream algorithm. We evaluate our approach on two fairness benchmark datasets UCI Adult and Heritage Health dataset and show that representations provided by our method preserve more information at the desired fairness threshold compared to other adversarial as well as non-adversarial baselines. Our main contributions are a) We theoretically show that mutual information between the representations and the sensitive attributes bounds the statistical parity of any decision algorithm; b) We propose practical ways to limit mutual information leveraging contrastive information estimators to efficiently trade-off predictability and accuracy. We validate our approach on two datasets UCI Adult (Dua and Graff 2017) and Heritage Health4 Dataset. UCI Adult is 1994 census data with 30K samples in the train set and 15K samples in the test set. The target task is to predict whether the income exceeds $50K, and the protected attribute is considered gender (which is binary in this case). We use the same preprocessing as Moyer et al. (2018). Heritage Health dataset is data of around 51K patients (40K in the train set and 11K in the test set), and the task is to predict the Charleson Index, which is an indicator of 10-year survival of a patient. We consider age as the protected attribute, which has 9 possible values. We use the same preprocessing as Song et al. (2019).
Researcher Affiliation Academia Umang Gupta1, Aaron M Ferber2, Bistra Dilkina2, Greg Ver Steeg1 1 Information Sciences Institute, University of Southern California 2 University of Southern California {umanggup, gregv}@isi.edu, {aferber, dilkina}@usc.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The code to reproduce all the experiments is available at https://github. com/umgupta/fairness-via-contrastive-estimation.
Open Datasets Yes We evaluate our approach on two fairness benchmark datasets UCI Adult and Heritage Health dataset and show that representations provided by our method preserve more information at the desired fairness threshold compared to other adversarial as well as non-adversarial baselines. We validate our approach on two datasets UCI Adult (Dua and Graff 2017) and Heritage Health4 Dataset.
Dataset Splits No The paper mentions train and test sets but does not specify a distinct validation set split. It states: 'UCI Adult is 1994 census data with 30K samples in the train set and 15K samples in the test set.' and 'Heritage Health dataset is data of around 51K patients (40K in the train set and 11K in the test set)'. It then says: 'The held-out test set is used to evaluate representations on downstream tasks only; we use the training set for all the other steps.'
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes For a fair comparison, we set d = 8 for all the methods and use model components like encoder, decoder, etc., of the same complexity. We provide a detailed list of the hyperparameter grids, network architectures, and other training setups for all the methods in Appendix D.