Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Preserving Fairness in AI under Domain Shift

Authors: Serban Stan, Mohammad Rostami

JAIR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide empirical validation on three common fairness datasets to show that the challenge exists in practical setting and to demonstrate the effectiveness of our algorithm. We conduct extensive empirical explorations and demonstrate that the existing methods for fairness in AI are vulnerable in our learning setting and show that the proposed algorithm is effective in maintaining both the model performance and its fairness.
Researcher Affiliation	Academia	Serban Stan EMAIL Mohammad Rostami EMAIL University of Southern California Los Angeles, CA, USA
Pseudocode	Yes	Algorithm 1 Fair Adapt (thresh, ITR)
Open Source Code	Yes	Our implementation code is available at: https://github.com/rostami-m/Fair UDA/.
Open Datasets	Yes	We perform experiments on three datasets widely used by the AI fairness community. ... The UCI Adult dataset is part of the UCI database (Dua & Graff, 2017) ... The UCI German credit dataset contains financial information for 1000 different people applying for credit and is also part of the UCI database. ... The Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) recidivism dataset maintains information of over 5,000 individuals criminal records.
Dataset Splits	Yes	Experiments on these datasets have primarily considered random 70/30 splits for the training and test splits. While such data splits are useful in evaluating overfitting for fairness algorithms, features for training and test sets will be sampled from the same data distribution. As a result, randomly splitting the datasets is not suitable for our learning setting because domain shift will not exist between the training and the testing splits. Instead, we consider natural data splits obtained from sub-sampling the three datasets along different criteria to generate the training and testing splits. ... For details about the splits for each dataset, please refer to Appendix A.
Hardware Specification	No	The paper does not explicitly mention specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	Our implementation of our approach is done using the Py Torch (Paszke et al., 2019) deep learning library. The specific version of PyTorch is not explicitly stated.
Experiment Setup	Yes	We model our encoder eu as a one layer neural network with output space z R20. Classifiers g and h are also one layer networks with output space R2. We train our model for 45,000 iterations, where the first 30,000 iterations only involve source training. For the first 15,000 we only perform minimization of the binary cross entropy loss Lbce. We introduce source fairness training at iteration 15,000, and train the fair model, i.e. with respect to both Lbce and Lfair, for 15,000 more iterations. In the last 15,000 iterations we perform adaptation, where we optimize Lbce, Lfair on the source domain, Lfair on the target domain, and Lswd between the source and target embeddings eu((xs, as)), eu((xt, at)) respectively. We use a learning rate for Lbce, Lfair of 1e-4, and learning rate for Lswd of 1e-5.