Auditing Fairness by Betting

Authors: Ben Chugg, Santiago Cortes-Gomez, Bryan Wilder, Aaditya Ramdas

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficacy of our approach on three benchmark fairness datasets.
Researcher Affiliation Academia Departments of Machine Learning1 and Statistics2 Carnegie Mellon University {benchugg, scortesg, bwilder, aramdas}@cmu.edu
Pseudocode Yes Algorithm 1 Testing group fairness by betting
Open Source Code Yes All code is publicly available at https://github.com/bchugg/auditing-fairness.
Open Datasets Yes We demonstrate the efficacy of our approach on three benchmark fairness datasets: credit default data, US census data, and insurance data. We used a vanilla random forest for both the credit default dataset and the US census data. We interface with the census data by means the folktables package [51]. We use US health insurance data [52], which is synthetic data generated based on US census information.
Dataset Splits No The paper discusses fixed-time tests that use 'a batch of n iid observations' but does not specify train/validation/test splits with percentages or counts for their own experiments in the main text. It focuses on sequential testing where data is received continuously.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using 'vanilla random forest' and 'logistic regression' models, and 'the folktables package [51]', but it does not specify any software names with version numbers for libraries, frameworks, or programming languages (e.g., Python version, scikit-learn version, TensorFlow/PyTorch version).
Experiment Setup No The paper mentions using 'a vanilla random forest' and 'a logistic regression model' for the experiments and briefly discusses how models are made fair or unfair by manipulating means. However, it does not provide specific experimental setup details such as hyperparameters (e.g., learning rate, batch size, number of epochs, optimizer settings) or other system-level training configurations.