reproducibilityindex.ai

Fair Wasserstein Coresets

Authors: Zikai Xiong, Niccolo Dalmasso, Shubham Sharma, Freddy Lecue, Daniele Magazzeni, Vamsi Potluru, Tucker Balch, Manuela Veloso

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments conducted on both synthetic and real datasets show that FWC: (i) achieves a competitive fairness-utility tradeoff in downstream models compared to existing approaches, (ii) improves downstream fairness when added to the existing training data and (iii) can be used to reduce biases in predictions from large language models (GPT-3.5 and GPT-4).
Researcher Affiliation	Collaboration	Operations Research Center, Massachusetts Institute of Technology, zikai@mit.edu J.P.Morgan AI Research, {niccolo.dalmasso, shubham.x2.sharma, freddy.lecue, daniele.magazzeni, vamsi.k.potluru, tucker.balch, manuela.veloso}@jpmchase.com
Pseudocode	Yes	Algorithm 1 Majority Minimization for Solving (9)
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: Data are openly available and details to reproduce the main experimental results are provided in Section 7 and Appendix C. The code is not available publicly at the moment.
Open Datasets	Yes	We evaluate the performance of FWC on 4 datasets widely used in the fairness literature [19]: (i) Adult [7], (ii) German Credit [28], (iii) Communities and Crime [64] and (iv) Drug [20].
Dataset Splits	Yes	For all datasets, we randomly split 75% of the data into training/test set, and change the split during each separate run; the training data are further separated into training and validation with 90/10 to compute early stopping criteria during training.
Hardware Specification	Yes	All computations are run on an Ubuntu machine with 32GB of RAM and 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU.
Software Dependencies	No	For the downstream classifier, we use Adam optimizer [38] with a learning rate set to 10 3, a batch size of 32, a maximum number of epochs set to 500 with early stopping evaluated on the separate validation set with a patience of 10 epochs and both the features X and the protected attribute D are used for training the classifier. For k-means [42] and k-medoids [45, 58] we use the implementations available in the Python package Scikit-Learn [59]. The paper mentions software components like 'Adam optimizer' and 'Scikit-Learn' but does not provide specific version numbers for these or other key libraries to ensure full reproducibility of the software environment.
Experiment Setup	Yes	For FWC, we consider three different values of the fairness violation hyper-parameters ϵ for the optimization problem in (5). We compute the fairness-utility tradeoff by first training a 2-layer multilayer perceptron (MLP) classifier with Re Lu activations on the coresets created by each approach and then evaluating the classifier demographic disparity (fairness) and AUC (utility). For the downstream classifier, we use Adam optimizer [38] with a learning rate set to 10 3, a batch size of 32, a maximum number of epochs set to 500 with early stopping evaluated on the separate validation set with a patience of 10 epochs and both the features X and the protected attribute D are used for training the classifier.