reproducibilityindex.ai

Probably Approximate Shapley Fairness with Applications in Machine Learning

Authors: Zijian Zhou, Xinyi Xu, Rachael Hwee Ling Sim, Chuan Sheng Foo, Bryan Kian Hsiang Low

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically verify GAE outperforms several existing methods in guaranteeing fairness while remaining competitive in estimation accuracy in various ML scenarios using real-world datasets.
Researcher Affiliation	Collaboration	1Department of Computer Science, National University of Singapore, Singapore 2Institute for Infocomm Research, ASTAR, Singapore 3Centre for Frontier AI Research, ASTAR, Singapore
Pseudocode	No	The details and full pseudo-code of the algorithm is given in (Zhou et al. 2023).
Open Source Code	Yes	the code is available at https://github.com/BobbyZhouZijian/ProbablyApproximateShapleyFairness.
Open Datasets	Yes	breast cancer (Street 1995) and MNIST (Le Cun et al. 1990) (diabetes (Efron et al. 2004)) datasets and set test accuracy (negative mean squared error) as v using data Shapley (Ghorbani and Zou 2019) with mi = 50. fi is approximated using sample evaluations. ... synthetic Gaussian (Kwon and Zou 2022) and Covertype (Blackard 1998) datasets... used-car price prediction (Aditya 2019) and credit card fraud detection (Dal Pozzolo et al. 2014)... hotel reviews sentiment prediction (Alam, Ryu, and Lee 2016) and Uber-lyft rides price prediction (BM 2018); in addition, we also consider (Wang et al. 2020, Definition 1) (FL) using two image recognition tasks (MNIST (Le Cun et al. 1990) and CIFAR-10 (Krizhevsky, Sutskever, and Hinton 2012)) and two natural language processing tasks (movie reviews (Pang and Lee 2005) and Stanford Sentiment Treebank-5 (Kim 2014))... adult income (Kohavi and Becker 1996), iris (Fisher 1988), wine (Forina et al. 1991), and covertype (Blackard 1998) classification.
Dataset Splits	No	The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts) for the underlying ML models that produce the utility function 'v', which is necessary for full reproducibility of the experiment.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or cloud instance types used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	For bootstrapping (included for all baselines), we uniformly randomly select 20 permutations and evaluate the marginal contributions for each i. We set a budget m = 2000 for each baseline. ... For hyperparameters, since the largest n among these scenarios is 7, we set the budget m = 1000 and the bootstrapping of 300 evaluations (a total of 1300 evaluations for each baseline). We set ξ = 1e-3 and vary α {0, 2, 5, 100} where 100 simulates α .