Faking Fairness via Stealthily Biased Sampling

Authors: Kazuto Fukuchi, Satoshi Hara, Takanori Maehara412-419

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we show that the stealthily biased sampling is indeed difficult to detect, through experiments on synthetic data and two real-world data (COMPAS and Adult).
Researcher Affiliation Academia 1University of Tsukuba, 2Osaka University, 3RIKEN Center for Advanced Intelligence Project
Pseudocode No The paper describes algorithms conceptually and mathematically, but it does not provide pseudocode or a clearly labeled algorithm block.
Open Source Code Yes The codes can be found at https://github.com/sato9hara/stealthily-biased-sampling
Open Datasets Yes For the first real-world data experiment, we focus on the COMPAS dataset (Angwin et al. 2016). [...] As the second real-world data experiment, we used the Adult dataset (Dheeru and Karra Taniskidou 2017).
Dataset Splits No For COMPAS: 'we randomly held out 1, 278 records as the referential dataset D for the detector. From the remaining 4, 000 records D, we sampled 2, 000 records as Z'. For Adult: 'we randomly split 10,000 records for the training set, 20,000 records for the test set, and the remaining 18,842 records for the referential set D for the detector.' No explicit mention of a validation set split.
Hardware Specification No The paper describes the software used (e.g., Python 3, LEMON Graph Library) and the computational method (network simplex method) but does not specify any particular hardware used for running the experiments (e.g., CPU/GPU models, memory).
Software Dependencies No We used Python 3 for data processing. [...] To solve the minimum-cost flow problem (3.2), we used the network simplex method implemented in LEMON Graph Library. (While Python 3 is versioned, the LEMON Graph Library lacks a specified version number, preventing full reproducibility of the software environment).
Experiment Setup Yes We set the parameters in the criteria (5.1) to be b = 0.2. [...] We sampled sensitive feature s with P(s = 1) = 0.5, and sampled feature vector x in a uniformly random manner over [0, 1]d with d = 1. [...] In the experiment, we set the significance level of the test to be 0.05. [...] To reduce the DP in the sampling, we required the sampled set to satisfy P(y = 1 | s = 1) P(y = 1 | s = 0) α for some α [0, 1].