Hunting for Discriminatory Proxies in Linear Regression Models

Authors: Samuel Yeom, Anupam Datta, Matt Fredrikson

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we present empirical results on two law enforcement datasets that exhibit varying degrees of racial disparity in prediction outcomes, demonstrating that proxies shed useful light on the causes of discriminatory behavior in models. and Finally, in Section 5 we evaluate our algorithm with two real-world predictive policing applications.
Researcher Affiliation Academia Samuel Yeom Carnegie Mellon University syeom@cs.cmu.edu Anupam Datta Carnegie Mellon University danupam@cmu.edu Matt Fredrikson Carnegie Mellon University mfredrik@cs.cmu.edu
Pseudocode No The paper presents optimization problems (Problem 1 and Problem 2) but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions using the 'cvxopt package [2] in Python' and provides its URL (http://cvxopt.org), but this is a third-party tool used by the authors, not a statement that they are releasing their own source code for the methodology presented.
Open Datasets Yes We ran our proxy detection algorithms on observational data from Chicago s Strategic Subject List (SSL) model [9] and the Communities and Crimes (C&C) dataset [15]. and references [9] 'City of Chicago. Strategic Subject List. https://data.cityofchicago.org/Public-Safety/Strategic-Subject-List/4aki-r3np, 2017.' and [15] 'UCI machine learning repository. https://archive.ics.uci.edu/ml, 2017.'
Dataset Splits No The paper mentions using datasets for evaluation and training a linear regression model, but it does not provide specific details on train/validation/test splits, percentages, or sample counts needed to reproduce the data partitioning.
Hardware Specification No The paper states that the algorithms were implemented 'with the cvxopt package [2] in Python' but does not provide any specific hardware details such as GPU or CPU models, memory specifications, or cloud resources used for the experiments.
Software Dependencies No The paper states, 'We implemented Problems 1 and 2 with the cvxopt package [2] in Python.' While it names the software, it does not specify version numbers for either 'cvxopt' or 'Python', which is required for reproducibility.
Experiment Setup Yes For example, one proxy consisting of 58 of the 90 input variables achieves an influence of 0.34 when ϵ = 0.85. and The strengths of the proxies for race are given in Table 1. The estimated influence was computed as (c T α)2/Var( ˆY ).