Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SenSeI: Sensitive Set Invariance for Enforcing Individual Fairness

Authors: Mikhail Yurochkin, Yuekai Sun

ICLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, in the experimental studies we demonstrate improved fairness metrics in comparison to several recent fair training procedures on three ML tasks that are susceptible to algorithmic bias.
Researcher Affiliation	Collaboration	Mikhail Yurochkin IBM Research MIT-IBM Watson AI Lab EMAIL Yuekai Sun Department of Statistics University of Michigan EMAIL
Pseudocode	Yes	Algorithm 1 Sen Se I: Sensitive Set Invariance
Open Source Code	No	3We will open-source the code and merge variable names with their abbreviations
Open Datasets	Yes	Data is available through the Toxic Comment Classiﬁcation Challenge Kaggle competition. De-Arteaga et al. (2019) proposed Bias in Bios dataset to study fairness in occupation prediction from a person s bio. The Adult dataset (Bache & Lichman, 2013) is a common benchmark in the group fairness literature.
Dataset Splits	Yes	We repeat our experiment 10 times with random 70-30 train-test splits, every time utilizing a random subset of 25 counterfactuals during training. We repeat the experiment 10 times with 70-30 train-test splits
Hardware Specification	No	No specific hardware details (like GPU/CPU models or memory) are provided.
Software Dependencies	No	The paper mentions BERT but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	In Table 4 for each hyperparameter we summarize its meaning, abbreviation, name in the code provided with the submission and methods where it is used. To select hyperparameters for each experiment we performed a grid search on an independent train-test split. Then we ﬁxed selected hyperparameters and ran 10 experiment repetitions with random train test splits (these results are reported in the main text). Hyperparameter choices for all experiments are summarized in Tables 5, 6, 7.