FairWASP: Fast and Optimal Fair Wasserstein Pre-processing

Authors: Zikai Xiong, Niccolò Dalmasso, Alan Mishler, Vamsi K. Potluru, Tucker Balch, Manuela Veloso

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that our proposed optimization algorithm significantly outperforms state-of-the-art commercial solvers in solving both the MIP and its linear program relaxation. Further experiments highlight the competitive performance of Fair WASP in reducing disparities while preserving accuracy in downstream classification settings.
Researcher Affiliation Collaboration Zikai Xiong1, Niccol o Dalmasso2,*, Alan Mishler2, Vamsi K. Potluru2, Tucker Balch2, Manuela Veloso2 1Massachusetts Institute of Technology 2J.P. Morgan AI Research, New York zikai@mit.edu, {niccolo.dalmasso, first.last}@jpmchase.com
Pseudocode Yes Algorithm 1: General Cutting Plane Method for (D-2) 1: Choose a bounded set E0 containing an optimal solution 2: for k from 0 to n do 3: Choose λk from Ek 4: Compute g Rm such that g λk g λ for any λ Λ 5: Choose Ek+1 {λ Ek : g λ g λk} 6: end for
Open Source Code No The paper states 'See the Supplementary Materials for complete proofs of theoretical claims, more discussion, and details on our algorithm and experiments results.', but does not explicitly state that the source code for the Fair WASP methodology is publicly available or provide a link to it.
Open Datasets Yes Real Datasets We consider the following four real datasets widely used in the fairness literature (Fabris et al. 2022): (i) the Adult dataset (Becker and Kohavi 1996), (ii) the Drug dataset (Fehrman et al. 2017), (iii) the Communities and Crime dataset (Redmond 2009) and (iv) the German Credit dataset (Hofmann 1994).
Dataset Splits No The paper mentions '10 different train/test split' but does not specify a separate validation split or its size.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments. It only mentions runtime comparisons.
Software Dependencies No The paper mentions using commercial solvers Gurobi and Mosek and that 'The commercial solvers are run with default settings', but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes In all methods, the pre-processed dataset (or the dataset with no pre-processing, for the Uniform approach) is used to train a multi-layer perceptron (MLP) classifier with one hidden layer with 20 nodes and Re Lu activation function.