Probably Approximate Shapley Fairness with Applications in Machine Learning
Authors: Zijian Zhou, Xinyi Xu, Rachael Hwee Ling Sim, Chuan Sheng Foo, Bryan Kian Hsiang Low
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically verify GAE outperforms several existing methods in guaranteeing fairness while remaining competitive in estimation accuracy in various ML scenarios using real-world datasets. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, National University of Singapore, Singapore 2Institute for Infocomm Research, A*STAR, Singapore 3Centre for Frontier AI Research, A*STAR, Singapore |
| Pseudocode | No | The details and full pseudo-code of the algorithm is given in (Zhou et al. 2023). |
| Open Source Code | Yes | the code is available at https://github.com/BobbyZhouZijian/ProbablyApproximateShapleyFairness. |
| Open Datasets | Yes | breast cancer (Street 1995) and MNIST (Le Cun et al. 1990) (diabetes (Efron et al. 2004)) datasets and set test accuracy (negative mean squared error) as v using data Shapley (Ghorbani and Zou 2019) with mi = 50. fi is approximated using sample evaluations. ... synthetic Gaussian (Kwon and Zou 2022) and Covertype (Blackard 1998) datasets... used-car price prediction (Aditya 2019) and credit card fraud detection (Dal Pozzolo et al. 2014)... hotel reviews sentiment prediction (Alam, Ryu, and Lee 2016) and Uber-lyft rides price prediction (BM 2018); in addition, we also consider (Wang et al. 2020, Definition 1) (FL) using two image recognition tasks (MNIST (Le Cun et al. 1990) and CIFAR-10 (Krizhevsky, Sutskever, and Hinton 2012)) and two natural language processing tasks (movie reviews (Pang and Lee 2005) and Stanford Sentiment Treebank-5 (Kim 2014))... adult income (Kohavi and Becker 1996), iris (Fisher 1988), wine (Forina et al. 1991), and covertype (Blackard 1998) classification. |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts) for the underlying ML models that produce the utility function 'v', which is necessary for full reproducibility of the experiment. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | For bootstrapping (included for all baselines), we uniformly randomly select 20 permutations and evaluate the marginal contributions for each i. We set a budget m = 2000 for each baseline. ... For hyperparameters, since the largest n among these scenarios is 7, we set the budget m = 1000 and the bootstrapping of 300 evaluations (a total of 1300 evaluations for each baseline). We set ξ = 1e-3 and vary α {0, 2, 5, 100} where 100 simulates α . |