Optimized Pre-Processing for Discrimination Prevention
Authors: Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, Kush R. Varshney
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Two instances of the proposed optimization are applied to datasets, including one on real-world criminal recidivism. Results show that discrimination can be greatly reduced at a small cost in classification accuracy. |
| Researcher Affiliation | Collaboration | Flavio P. Calmon Harvard University flavio@seas.harvard.edu Dennis Wei IBM Research AI dwei@us.ibm.com Bhanukiran Vinzamuri IBM Research AI bhanu.vinzamuri@ibm.com Karthikeyan Natesan Ramamurthy IBM Research AI knatesa@us.ibm.com Kush R. Varshney IBM Research AI krvarshn@us.ibm.com |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions implementing LFR themselves but does not provide a statement or link for the open-sourcing of their own proposed methodology's code. |
| Open Datasets | Yes | We apply the pipeline to Pro Publica s COMPAS recidivism data [Pro Publica, 2017] and the UCI Adult dataset [Lichman, 2013]. |
| Dataset Splits | Yes | using 5-fold cross validation. |
| Hardware Specification | No | The paper only states 'All experiments run in minutes on a standard laptop,' which is too general and lacks specific hardware details. |
| Software Dependencies | No | The paper mentions using a 'standard convex solver' (citing CVXPY) and 'Sci Py package' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | For both datasets the utility metric is the total variation distance, i.e. p X,Y , p ˆ X, ˆY = 1 2 P x,y p X,Y (x, y) p ˆ X, ˆY (x, y) , the distortion constraint is the combination of (2) and (3), and two levels of discrimination control are used, ϵ = {0.05, 0.1}. The distortion function δ is chosen differently for the two datasets as described below, based on the differing semantics of the variables in the two applications. The parameters for LFR were set as recommended in Zemel et al. [2013]: Az = 50 (group fairness), Ax = 0.01 (individual fairness), and Ay = 1 (prediction accuracy). Once the optimized randomized mapping p ˆ X, ˆY |D,X,Y is determined, we apply it to the training set to obtain a new perturbed training set, which is then used to fit two classifiers: logistic regression (LR) and random forest (RF). |