Fair Inference on Outcomes

Authors: Razieh Nabi, Ilya Shpitser

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first illustrate our approach to fair inference via two datasets: the COMPAS dataset (Angwin et al. 2016) and the Adult dataset (Lichman 2013). We also illustrate how a part of the model involving the outcome Y may be regularized without compromising fair inferences if the NDE quantifying discrimination is estimated using methods that are robust to misspecification of the Y model. ... Using unconstrained BART, our prediction accuracy on the test set was 67.8%, removing treatment from the outcome model dropped the accuracy to 64.0%, and using constrained BART lead to the accuracy of 66.4%. ... Accuracy in the unconstrained model is the highest, 82%, and the lowest in the drop A scenario, 42%, (as expected). The constrained model not only boosts accuracy to 72%, but also guarantees fairness, in our sense.
Researcher Affiliation Academia Razieh Nabi, Ilya Shpitser Computer Science Department Johns Hopkins University {rnabiab1@, ilyas@cs}.jhu.edu
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper states 'We implemented this simple method by modifying the R package (with a C++ backend) Bayes Tree', but it does not provide an explicit statement or link for the open-sourcing of their modified code.
Open Datasets Yes We first illustrate our approach to fair inference via two datasets: the COMPAS dataset (Angwin et al. 2016) and the Adult dataset (Lichman 2013). ... The adult dataset from the UCI repository has records on 14 attributes such as demographic information, level of education, and job related variables such as occupation and work class on 48842 instances along with their income... Lichman, M. 2013. UCI machine learning repository. https:// archive.ics.uci.edu/ml/datasets/adult.
Dataset Splits No The paper mentions 'split the data into training and validation sets' for a simulation study, but it does not specify the exact percentages or counts for these splits. For the main datasets (COMPAS, Adult), only test set evaluation is explicitly mentioned without detailing training or validation splits.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'R package (with a C++ backend) Bayes Tree' but does not provide specific version numbers for R, C++, or the Bayes Tree package.
Experiment Setup Yes The Y model is trained by maximizing the constrained likelihood in (4) using the R package nloptr. ... We generated 4000 data points using the models shown in (10) and split the data into training and validation sets. ... We assume A and M models are correctly specified; A is randomized (like race or gender) and M has a logistic regression model with interaction terms. ... If we use logistic regression to model Y and linear regression to model other variables given their past...