Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Reproducibility Study: Equal Improvability: A New Fairness Notion Considering the Long-Term Impact

Authors: Berkay Chakar, Amina Izbassar, Mina Janićijević, Jakub Tomaszewski

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This reproducibility study aims to evaluate the robustness of Equal Improvability (EI) an effort-based framework for ensuring long-term fairness. To this end, we seek to analyze the three proposed EI-ensuring regularization techniques, i.e. Covariance-based, KDE-based, and Loss-based EI. Our findings largely substantiate the initial assertions, demonstrating EI s enhanced performance over Empirical Risk Minimization (ERM) techniques on various test datasets.
Researcher Affiliation	Academia	Berkay Chakar EMAIL Amina Izbassar EMAIL Mina Janićijević EMAIL Jakub Tomaszewski EMAIL
Pseudocode	No	The paper describes the methods and equations in prose and mathematical notation (e.g., in Section 3.1 and Appendix A) but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Our code is publicly available on Git Hub.
Open Datasets	Yes	To assess the reliability of the feature and example importance methods, the authors developed one synthetic dataset and also employed two real-world datasets, specifically the German Statlog Credit and ACSIncome CA (Dua & Graff, 2017; Ding et al., 2021). Our study broadened this analysis by incorporating the Default of Credit Card Clients Dataset (DCC Dataset), which was selected due to its inherent gender and age bias, providing a relevant case for testing fairness in models (Yeh, 2016).
Dataset Splits	Yes	All datasets used for the experiments were split into training/test sets in the ratio of 4:1.
Hardware Specification	Yes	Specifically, we employed an NVIDIA T4 GPU and an Apple M1 Pro CPU chip for this purpose.
Software Dependencies	No	The initial challenge was in setting up the required environment. The provided requirements file had incorrect dependencies, which hindered the creation of a compatible environment. We had to revise and adjust the dependency versions to ensure their mutual compatibility, which was more time-consuming than anticipated.
Experiment Setup	Yes	The authors of the original study provided detailed hyperparameter configurations in Appendix C.2 of their paper, as well as within the supplementary notebooks. We adhere to these specified hyperparameters for our replication efforts, ensuring consistency with the original experiments. When conducting additional experiments, we utilize similar hyperparameter settings to ensure that our results are comparable.