Fair regression with Wasserstein barycenters
Authors: Evgenii Chzhen, Christophe Denis, Mohamed Hebiri, Luca Oneto, Massimiliano Pontil
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments indicate that our method is very effective in learning fair models, with a relative increase in error rate that is inferior to the relative gain in fairness. 5 Empirical study In this section, we present numerical experiments5 with the proposed fair regression estimator defined in Section 3. In all experiments, we collect statistics on the test set T = {(Xi, Si, Yi)}ntest i=1. The empirical mean squared error (MSE) is defined as MSE (g) = 1 ntest (X,S,Y ) T (Y g(X, S))2 . We also measure the violation of fairness constraint imposed by Definition 2.2 via the empirical Kolmogorov-Smirnov (KS), KS (g) = max s,s S sup t R (X,S,Y ) T s 1{g(X,S) t} 1 |T s | (X,S,Y ) T s 1{g(X,S) t} where for all s S we define the set T s= {(X, S, Y ) T : S=s}. For all datasets we split the data in two parts (70% train and 30% test), this procedure is repeated 30 times, and we report the average performance on the test set alongside its standard deviation. We employ the 2-steps 10-fold CV procedure considered by [17] to select the best hyperparameters with the training set. |
| Researcher Affiliation | Academia | Evgenii Chzhen LMO, Université Paris-Saclay CNRS, Inria Christophe Denis LAMA, Université Gustave Eiffel MIA-Paris, Agro Paris Tech INRAE, Université Paris-Saclay Mohamed Hebiri LAMA, Université Gustave Eiffel CREST, ENSAE, IP Paris Luca Oneto DIBRIS, University of Genoa Massimiliano Pontil Istituto Italiano di Tecnologia University College London |
| Pseudocode | Yes | A pseudo-code implementation of ˆg in Eq. (6) is reported in Algorithm 1. |
| Open Source Code | Yes | The source of our method can be found at https://github.com/lucaoneto/NIPS2020_Fairness. |
| Open Datasets | Yes | Communities&Crime (CRIME) contains socio-economic, law enforcement, and crime data about communities in the US [37]... Law School (LAW) refers to the Law School Admissions Councils National Longitudinal Bar Passage Study [44]... National Longitudinal Survey of Youth (NLSY) involves survey results by the U.S. Bureau of Labor Statistics that is intended to gather information on the labor market activities and other life events of several groups [8]... Student Performance (STUD), approaches 649 students achievement (final grade) in secondary education of two Portuguese schools using 33 attributes [14]... |
| Dataset Splits | Yes | For all datasets we split the data in two parts (70% train and 30% test), this procedure is repeated 30 times, and we report the average performance on the test set alongside its standard deviation. We employ the 2-steps 10-fold CV procedure considered by [17] to select the best hyperparameters with the training set. |
| Hardware Specification | No | The paper does not mention any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper refers to types of base estimators like "RLS", "KRLS", and "RF", but it does not specify any software libraries, frameworks, or tools with their version numbers that were used for implementation or experimentation. |
| Experiment Setup | Yes | The hyperparameters of the methods are set as follows. For RLS we set the regularization hyperparameters λ ∈ 10{-4.5, -3.5, ..., 3} and for KRLS we set λ ∈ 10{-4.5, -3.5, ..., 3} and γ ∈ 10{-4.5, -3.5, ..., 3}. Finally, for RF we set to 1000 the number of trees and for the number of features to select during the tree creation we search in {d^1/4, d^1/2, d^3/4}. |