Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Reproducibility Study: Equal Improvability: A New Fairness Notion Considering the Long-Term Impact

Authors: Berkay Chakar, Amina Izbassar, Mina Janićijević, Jakub Tomaszewski

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This reproducibility study aims to evaluate the robustness of Equal Improvability (EI) an effort-based framework for ensuring long-term fairness. To this end, we seek to analyze the three proposed EI-ensuring regularization techniques, i.e. Covariance-based, KDE-based, and Loss-based EI. Our findings largely substantiate the initial assertions, demonstrating EI s enhanced performance over Empirical Risk Minimization (ERM) techniques on various test datasets.
Researcher Affiliation Academia Berkay Chakar EMAIL Amina Izbassar EMAIL Mina Janićijević EMAIL Jakub Tomaszewski EMAIL
Pseudocode No The paper describes the methods and equations in prose and mathematical notation (e.g., in Section 3.1 and Appendix A) but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Our code is publicly available on Git Hub.
Open Datasets Yes To assess the reliability of the feature and example importance methods, the authors developed one synthetic dataset and also employed two real-world datasets, specifically the German Statlog Credit and ACSIncome CA (Dua & Graff, 2017; Ding et al., 2021). Our study broadened this analysis by incorporating the Default of Credit Card Clients Dataset (DCC Dataset), which was selected due to its inherent gender and age bias, providing a relevant case for testing fairness in models (Yeh, 2016).
Dataset Splits Yes All datasets used for the experiments were split into training/test sets in the ratio of 4:1.
Hardware Specification Yes Specifically, we employed an NVIDIA T4 GPU and an Apple M1 Pro CPU chip for this purpose.
Software Dependencies No The initial challenge was in setting up the required environment. The provided requirements file had incorrect dependencies, which hindered the creation of a compatible environment. We had to revise and adjust the dependency versions to ensure their mutual compatibility, which was more time-consuming than anticipated.
Experiment Setup Yes The authors of the original study provided detailed hyperparameter configurations in Appendix C.2 of their paper, as well as within the supplementary notebooks. We adhere to these specified hyperparameters for our replication efforts, ensuring consistency with the original experiments. When conducting additional experiments, we utilize similar hyperparameter settings to ensure that our results are comparable.