reproducibilityindex.ai

Assessing Fairness in the Presence of Missing Data

Authors: Yiliang Zhang, Qi Long

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide upper and lower bounds on the fairness estimation error and conduct numerical experiments to assess our theoretical results. Our work provides the ﬁrst known theoretical results on fairness guarantee in analysis of incomplete data.
Researcher Affiliation	Academia	Yiliang Zhang University of Pennsylvania Philadelphia, PA 19104, USA zylthu14@sas.upenn.edu Qi Long University of Pennsylvania Philadelphia, PA 19104, USA qlong@upenn.edu
Pseudocode	No	The paper describes algorithms and methods in text and mathematical formulations but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, nor does it provide a direct link to a code repository.
Open Datasets	Yes	We conduct analyses of two real datasets, one from the COMPAS and the other from the ADNI. The dataset analyzed in this work contains records of defendants from Broward County from 2013 and 2014. The dataset from the Alzheimer s Disease Neuroimaging Initiative (ADNI) contains gene expression and clinical data for 649 patients.
Dataset Splits	Yes	In each experiment, we randomly split the real dataset into two subsets. In the ﬁrst subset, we generate missing values, and the complete cases in this subset are used to train a random forest prediction model g and estimate its fairness in the complete data domain. The true fairness T (g) is approximated using the entire second subset.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory specifications).
Software Dependencies	No	The paper mentions using statistical models and algorithms like 'logistic regression', 'random forest', 'support vector machine', and 'XGBoost', but it does not specify any version numbers for these software components or libraries.
Experiment Setup	Yes	In our simulation experiments, we assess the upper bound in Theorem 1 in a classiﬁcation task and the lower bound in Theorem 2 in a regression task. In each experiment, we generate 10 predictors and a binary sensitive attribute A {0, 1} with n samples. Unless noted otherwise, the predictors are generated from Gaussian distributions: xij N(1 2Ai, 0.52). We use a set of 2000 data [...] to train a prediction algorithm g. We use linear SVM as the prediction model g. We vary the total sample size n from 103 to 105, and for each ﬁxed n we examine different levels of sample imbalance between the two sensitive groups by varying the ratio of n1/n0 from 1 to 20. Missingness of data is generated under MAR using the following model logit(π(zi, Ai)) = 2 - 1/5 P10 j=1 xij.