Faking Fairness via Stealthily Biased Sampling
Authors: Kazuto Fukuchi, Satoshi Hara, Takanori Maehara412-419
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we show that the stealthily biased sampling is indeed difficult to detect, through experiments on synthetic data and two real-world data (COMPAS and Adult). |
| Researcher Affiliation | Academia | 1University of Tsukuba, 2Osaka University, 3RIKEN Center for Advanced Intelligence Project |
| Pseudocode | No | The paper describes algorithms conceptually and mathematically, but it does not provide pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | The codes can be found at https://github.com/sato9hara/stealthily-biased-sampling |
| Open Datasets | Yes | For the first real-world data experiment, we focus on the COMPAS dataset (Angwin et al. 2016). [...] As the second real-world data experiment, we used the Adult dataset (Dheeru and Karra Taniskidou 2017). |
| Dataset Splits | No | For COMPAS: 'we randomly held out 1, 278 records as the referential dataset D for the detector. From the remaining 4, 000 records D, we sampled 2, 000 records as Z'. For Adult: 'we randomly split 10,000 records for the training set, 20,000 records for the test set, and the remaining 18,842 records for the referential set D for the detector.' No explicit mention of a validation set split. |
| Hardware Specification | No | The paper describes the software used (e.g., Python 3, LEMON Graph Library) and the computational method (network simplex method) but does not specify any particular hardware used for running the experiments (e.g., CPU/GPU models, memory). |
| Software Dependencies | No | We used Python 3 for data processing. [...] To solve the minimum-cost flow problem (3.2), we used the network simplex method implemented in LEMON Graph Library. (While Python 3 is versioned, the LEMON Graph Library lacks a specified version number, preventing full reproducibility of the software environment). |
| Experiment Setup | Yes | We set the parameters in the criteria (5.1) to be b = 0.2. [...] We sampled sensitive feature s with P(s = 1) = 0.5, and sampled feature vector x in a uniformly random manner over [0, 1]d with d = 1. [...] In the experiment, we set the significance level of the test to be 0.05. [...] To reduce the DP in the sampling, we required the sampled set to satisfy P(y = 1 | s = 1) P(y = 1 | s = 0) α for some α [0, 1]. |