Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Individual Fairness In Strategic Classification

Authors: Zhiqun Zuo, Mohammad Mahdi Khalili

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on real-world datasets confirm that our method effectively mitigates unfairness and improves the fairness-accuracy trade-off. ... In this section, we conduct experiments on two real-world datasets to evaluate the effectiveness of our proposed methods.
Researcher Affiliation	Academia	Zhiqun Zuo Mohammad Mahdi Khalili EMAIL EMAIL Department of Computer Science and Engineering The Ohio State University Columbus, OH 43210
Pseudocode	Yes	Algorithm 1 Finding optimal randomized classifier satisfying individual fairness w.r.t. BRC Input: Training data D = {xi, yi}N i=1, λ, Mc, function g, l ... Algorithm 2 Inference with a randomized classifier Input: Data point x, threshold distribution pck, intervals (sk, sk+1), k {1, ..., K}, function l
Open Source Code	Yes	Justification: We presents the details of experiment implementation in Section 6 and Appendix B for reproducing our results. We would also release the code and data used in the experiments to enhance reproducibility. ... Justification: We submit the code and dataset we used to produce the experiment results with our paper.
Open Datasets	Yes	The first dataset is the FICO dataset, preprocessed by [15]. This dataset provides the cumulative distribution of credit scores across different racial groups. ... The second dataset is the Law School Dataset [33], which contains 18,692 student records.
Dataset Splits	Yes	We split the dataset into train/validation/test datasets randomly with a ratio of 60%/20/%20.
Hardware Specification	Yes	All experiments are conducted on a server equipped with 64 AMD EPYC 7313 16-Core CPUs. The server also includes 8 NVIDIA RTX A5000 GPUs (24GB each), although GPU resources are not utilized for our experiments.
Software Dependencies	No	The paper does not explicitly list specific software dependencies with version numbers within its main text or appendices.
Experiment Setup	Yes	For the Law School dataset, we set the number of bins K to 80 and set it to 200 for FICO dataset. ... The baseline model for individual fairness experiment (Tables 1 and 2) is the best deterministic classifier. For the group fairness experiments (Tables 3 and 4), the baseline model consists of two deterministic thresholds (one for each group). We find the deterministic thresholds using grid search on all possible combinations which will maximize the macro average F1 score while satisfying group fairness constraints. ... We define the individual utility function as Uindiv(f) = [f(x ) f(x)] 100(l(x ) l(x))2, with the associated cost function given by c(x, x ) = 100(l(x ) l(x))2. ... We define the individual utility function as Uindiv(f) = [f(x ) f(x)] (l(x ) l(x))2, with the corresponding cost function c(x, x ) = (l(x ) l(x))2.