reproducibilityindex.ai

Faking Fairness via Stealthily Biased Sampling

Authors: Kazuto Fukuchi, Satoshi Hara, Takanori Maehara412-419

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we show that the stealthily biased sampling is indeed difﬁcult to detect, through experiments on synthetic data and two real-world data (COMPAS and Adult).
Researcher Affiliation	Academia	1University of Tsukuba, 2Osaka University, 3RIKEN Center for Advanced Intelligence Project
Pseudocode	No	The paper describes algorithms conceptually and mathematically, but it does not provide pseudocode or a clearly labeled algorithm block.
Open Source Code	Yes	The codes can be found at https://github.com/sato9hara/stealthily-biased-sampling
Open Datasets	Yes	For the ﬁrst real-world data experiment, we focus on the COMPAS dataset (Angwin et al. 2016). [...] As the second real-world data experiment, we used the Adult dataset (Dheeru and Karra Taniskidou 2017).
Dataset Splits	No	For COMPAS: 'we randomly held out 1, 278 records as the referential dataset D for the detector. From the remaining 4, 000 records D, we sampled 2, 000 records as Z'. For Adult: 'we randomly split 10,000 records for the training set, 20,000 records for the test set, and the remaining 18,842 records for the referential set D for the detector.' No explicit mention of a validation set split.
Hardware Specification	No	The paper describes the software used (e.g., Python 3, LEMON Graph Library) and the computational method (network simplex method) but does not specify any particular hardware used for running the experiments (e.g., CPU/GPU models, memory).
Software Dependencies	No	We used Python 3 for data processing. [...] To solve the minimum-cost ﬂow problem (3.2), we used the network simplex method implemented in LEMON Graph Library. (While Python 3 is versioned, the LEMON Graph Library lacks a specified version number, preventing full reproducibility of the software environment).
Experiment Setup	Yes	We set the parameters in the criteria (5.1) to be b = 0.2. [...] We sampled sensitive feature s with P(s = 1) = 0.5, and sampled feature vector x in a uniformly random manner over [0, 1]d with d = 1. [...] In the experiment, we set the signiﬁcance level of the test to be 0.05. [...] To reduce the DP in the sampling, we required the sampled set to satisfy P(y = 1 \| s = 1) P(y = 1 \| s = 0) α for some α [0, 1].