Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Fair Representation Learning with Controllable High Confidence Guarantees via Adversarial Inference

Authors: Yuhong Luo, Austin Hoag, Xintong Wang, Philip S. Thomas, Przemyslaw Grabowicz

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate FRG on three real-world datasets, comparing its performance to six state-of-the-art fair representation learning methods. Our results demonstrate that FRG consistently bounds unfairness across a range of downstream models and tasks. [...] 6 Experiments Here, we evaluate the performance and fairness of FRG, focusing on the following research questions.
Researcher Affiliation	Collaboration	Yuhong Luo Rutgers University New Brunswick, NJ, USA EMAIL Austin Hoag Sony AI New York, NY, USA EMAIL Xintong Wang Rutgers University New Brunswick, NJ, USA EMAIL Philip S. Thomas University of Massachusetts Amherst, MA, USA EMAIL Przemyslaw A. Grabowicz 1,2 1University College Dublin, Ireland 2University of Massachusetts, Amherst, MA, USA EMAIL
Pseudocode	No	The paper only describes the methodology in narrative text and block diagrams (Figure 1) without structured pseudocode or algorithm blocks.
Open Source Code	Yes	The source code for FRG is available at: https://github.com/James Luoyh/FRG.
Open Datasets	Yes	We use three real-world datasets each with at least two downstream tasks, including the adversarial tasks that predict the sensitive attributes: UCI Adult [6] and Income (California only, commonly known as Retiring Adult) [17] both with 2 downstream tasks, and Heritage Health [34] with 3 downstream tasks.
Dataset Splits	Yes	For each dataset, we split the data into training (Dtrain), validation (Dval), and test (Dtest) sets according to ratio 0.6:0.2:0.2. For FRG, we sample 10% of the training set to be Df for fairness test and let candidate selection use the remaining 90% as Dc.
Hardware Specification	Yes	The GPU used is one NVIDIA A16, and we use 128 CPUs with the model AMD EPYC 9354 32-Core Processor.
Software Dependencies	No	The paper mentions the use of 'Adam optimizer' but does not provide specific version numbers for any programming languages, libraries, or other software dependencies.
Experiment Setup	Yes	In our hyperparameter tuning process, we adjust various parameters, including the step sizes (for the primary objective, the Lagrange multipliers, and the adversarial predictor), the initial Lagrange multipliers, the weight of the regularizers, the number of epochs, etc. [...] We set the minimum allowed step size for the primary objective to 10 6 and the minimum number of epochs to 100. [...] We use cross-entropy loss for all downstream models and Adam optimizer for all optimizations. The detailed choices of hyperparameters for each of the datasets, the unfairness thresholds ε s, and the baselines are provided with config files in the source code.