Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
SenSeI: Sensitive Set Invariance for Enforcing Individual Fairness
Authors: Mikhail Yurochkin, Yuekai Sun
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, in the experimental studies we demonstrate improved fairness metrics in comparison to several recent fair training procedures on three ML tasks that are susceptible to algorithmic bias. |
| Researcher Affiliation | Collaboration | Mikhail Yurochkin IBM Research MIT-IBM Watson AI Lab EMAIL Yuekai Sun Department of Statistics University of Michigan EMAIL |
| Pseudocode | Yes | Algorithm 1 Sen Se I: Sensitive Set Invariance |
| Open Source Code | No | 3We will open-source the code and merge variable names with their abbreviations |
| Open Datasets | Yes | Data is available through the Toxic Comment Classification Challenge Kaggle competition. De-Arteaga et al. (2019) proposed Bias in Bios dataset to study fairness in occupation prediction from a person s bio. The Adult dataset (Bache & Lichman, 2013) is a common benchmark in the group fairness literature. |
| Dataset Splits | Yes | We repeat our experiment 10 times with random 70-30 train-test splits, every time utilizing a random subset of 25 counterfactuals during training. We repeat the experiment 10 times with 70-30 train-test splits |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or memory) are provided. |
| Software Dependencies | No | The paper mentions BERT but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | In Table 4 for each hyperparameter we summarize its meaning, abbreviation, name in the code provided with the submission and methods where it is used. To select hyperparameters for each experiment we performed a grid search on an independent train-test split. Then we fixed selected hyperparameters and ran 10 experiment repetitions with random train test splits (these results are reported in the main text). Hyperparameter choices for all experiments are summarized in Tables 5, 6, 7. |