Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Standardized Interpretable Fairness Measures for Continuous Risk Scores
Authors: Ann-Kristin Becker, Oana Dumitrasc, Klaus Broelemann
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Section 5 contains results of experiments using benchmark data and Section 6 includes final discussion and outlook. |
| Researcher Affiliation | Industry | 1SCHUFA Holding AG, Wiesbaden, Germany. Correspondence to: Ann-Kristin Becker <EMAIL>, Oana Dumitrasc <EMAIL>, Klaus Broelemann <EMAIL>. |
| Pseudocode | No | The paper contains mathematical definitions, theorems, and proofs but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | The code used for the experiments in this study is online available 5. The repository includes detailed instructions for reproducing the results. 5https://github.com/schufa-innovationlab/fair-scoring |
| Open Datasets | Yes | We use the COMPAS dataset2, the Adult dataset3 and the German Credit dataset4 to demonstrate the application of the fairness measures for continuous risk scores. 2https://raw.githubusercontent.com/propublica/compas-analysis/master/compas-scores-two-years.csv 3https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data 4https://www.kaggle.com/datasets/uciml/germancredit?resource=download |
| Dataset Splits | No | The paper states: 'Both models have been trained on 70% of the dataset and evaluated on the remaining samples.' and 'All three models have been trained on 70% of the dataset and evaluated on the remaining samples.' This describes a train/test split but does not explicitly mention a separate validation set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models, or cloud computing instance specifications. |
| Software Dependencies | No | The paper mentions software components like 'scipy.wasserstein_distance', 'sklearn.calibration.calibration_curve', and 'sklearn.metrics.roc_curve' in Appendix C.1, but it does not specify their version numbers. |
| Experiment Setup | No | The paper describes general aspects of the experiment setup, such as training logistic regression and XGBoost models and data preprocessing (min-max-scaling, one-hot-encoding), but it does not specify concrete hyperparameters like learning rates, batch sizes, or optimizer settings for these models. |