On the Within-Group Fairness of Screening Classifiers
Authors: Nastaran Okati, Stratis Tsirtsis, Manuel Gomez Rodriguez
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we create multiple instances of a simulated screening process using US Census survey data to first investigate how frequently within-group unfairness occurs in a recruiting domain and then compare the partitions, as well as induced screening classifiers, provided by Algorithms 1, 2 and 3. We use a dataset consisting of 3.2 million individuals from the US Census (Ding et al., 2021). Each individual is represented by sixteen features and one label y {0, 1} indicating whether the individual is employed (y = 1) or not (y = 0). |
| Researcher Affiliation | Academia | 1Max Planck Institute for Software Systems. Correspondence to: Nastaran Okati <nastaran@mpi-sws.org>. |
| Pseudocode | Yes | Algorithm 1 It returns a partition Bpav such that f Bpav is within-group monotone. Algorithm 2 It returns the optimal partition B such that f B is within-group monotone. Algorithm 3 It returns the optimal partition B cal such that f B cal within-group calibrated. |
| Open Source Code | Yes | An implementation of our algorithms and the data used in our experiments are available at https://github.com/Networks-Learning/within-group-monotonicity. |
| Open Datasets | Yes | We use a dataset consisting of 3.2 million individuals from the US Census (Ding et al., 2021). For the experiments, we randomly split the dataset into two equally-sized and disjoint subsets. We use the first subset for training and calibration and the second subset for testing. More specifically, for each experiment, we create the training and calibration sets Dtr and Dcal by picking 100,000 and 50,000 individuals at random (without replacement) from the first subset. |
| Dataset Splits | Yes | More specifically, for each experiment, we create the training and calibration sets Dtr and Dcal by picking 100,000 and 50,000 individuals at random (without replacement) from the first subset. We use Dtr to train a logistic regression model f LR and use Dcal to both (approximately) calibrate f LR using uniform mass binning (UMB) (Wang et al., 2022; Zadrozny & Elkan, 2001). |
| Hardware Specification | Yes | We ran all experiments on a machine equipped with 48 Intel(R) Xeon(R) 2.50GHz CPU cores and 256GB memory. |
| Software Dependencies | No | The paper mentions using a "logistic regression model" but does not specify the software libraries or their version numbers (e.g., Python, scikit-learn, PyTorch, etc.) used for implementation. |
| Experiment Setup | Yes | We use Dtr to train a logistic regression model f LR and use Dcal to both (approximately) calibrate f LR using uniform mass binning (UMB) (Wang et al., 2022; Zadrozny & Elkan, 2001), i.e., discretize its outputs to n calibrated quality scores, and estimate the relevant probabilities ρi, ai, ρz | i and ai,z needed by Algorithms 1, 2, and 3. We experiment with several screening classifiers f with a varying number of bins n. |