Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Fairness Under Demographic Scarce Regime

Authors: Patrik Joslin Kenfack, Samira Ebrahimi Kahou, Ulrich Aïvodji

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on five datasets showed that the proposed framework yields models with significantly better fairness-accuracy tradeoffs than classic attribute classifiers. We perform extensive experiments on a wide range of real-world datasets to demonstrate the effectiveness of the proposed framework compared to existing methods.
Researcher Affiliation	Academia	Patrik Joslin Kenfack EMAIL ÉTS Montréal, Mila Samira Ebrahimi Kahou EMAIL University of Calgary, Mila Ulrich Aïvodji EMAIL ÉTS Montréal, Mila
Pseudocode	No	The paper describes the methodology and its stages in text, and Figure 1 provides a high-level overview diagram. However, it does not include any clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps.
Open Source Code	Yes	The source code is available at https://github.com/patrikken/fair-dsr.
Open Datasets	Yes	We validate our method on five real-world benchmarks widely used for bias assessment: Adult Income (Asuncion & Newman, 2007)1, Compas (Jeff et al., 2016), Law school (LSAC) (Wightman, 1998), Celeb A (Liu et al., 2018) (Wightman, 1998), and the New Adult (Ding et al., 2021) dataset.
Dataset Splits	Yes	We use 20% of each dataset as the group-labeled dataset (D2) and 80% as the dataset without sensitive attributes (D1). All the baselines are trained on 70% of D1, and fairness and accuracy are evaluated on the 30% as the test set.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies	No	The paper mentions 'Pytorch (Paszke et al., 2019)', 'Adam optimizer (Kingma & Ba, 2014)', 'Fairlean (Bird et al., 2020)', and 'MAPIE library (Cordier et al., 2023)'. While these citations refer to specific publications, they do not provide explicit software version numbers (e.g., PyTorch 1.9) as requested for reproducibility.
Experiment Setup	Yes	The student and teacher models were implemented as feed-forward Multi-layer Perceptrons (MLPs) with Pytorch (Paszke et al., 2019), and the loss function 1 is minimized using the Adam optimizer (Kingma & Ba, 2014) with learning rate 0.001 and batch size 256. Following Yu et al. (2019); Laine & Aila (2017), we used α = 0.99 for the EMA parameter for updating the teacher weights using the student s weights across epochs. The uncertainty threshold is finetuned over the interval [0.1, 0.7] using 10% of the training data. The best-performing threshold is used for the second step s threshold to obtain D 1. The uncertainty threshold that achieved the best results are 0.30, 0.60, 0.66, and 0.45 for the Adult, Compas, LSAC, and Celeb A datasets, respectively. Random forest was initialized with a maximum depth of 5 and minimum sample leaf of 10.