reproducibilityindex.ai

Standardized Interpretable Fairness Measures for Continuous Risk Scores

Authors: Ann-Kristin Becker, Oana Dumitrasc, Klaus Broelemann

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Section 5 contains results of experiments using benchmark data and Section 6 includes final discussion and outlook.
Researcher Affiliation	Industry	1SCHUFA Holding AG, Wiesbaden, Germany. Correspondence to: Ann-Kristin Becker <ann-kristin.becker@schufa.de>, Oana Dumitrasc <oana.dumitrasc@schufa.de>, Klaus Broelemann <klaus.broelemann@schufa.de>.
Pseudocode	No	The paper contains mathematical definitions, theorems, and proofs but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	The code used for the experiments in this study is online available 5. The repository includes detailed instructions for reproducing the results. 5https://github.com/schufa-innovationlab/fair-scoring
Open Datasets	Yes	We use the COMPAS dataset2, the Adult dataset3 and the German Credit dataset4 to demonstrate the application of the fairness measures for continuous risk scores. 2https://raw.githubusercontent.com/propublica/compas-analysis/master/compas-scores-two-years.csv 3https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data 4https://www.kaggle.com/datasets/uciml/germancredit?resource=download
Dataset Splits	No	The paper states: 'Both models have been trained on 70% of the dataset and evaluated on the remaining samples.' and 'All three models have been trained on 70% of the dataset and evaluated on the remaining samples.' This describes a train/test split but does not explicitly mention a separate validation set.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models, or cloud computing instance specifications.
Software Dependencies	No	The paper mentions software components like 'scipy.wasserstein_distance', 'sklearn.calibration.calibration_curve', and 'sklearn.metrics.roc_curve' in Appendix C.1, but it does not specify their version numbers.
Experiment Setup	No	The paper describes general aspects of the experiment setup, such as training logistic regression and XGBoost models and data preprocessing (min-max-scaling, one-hot-encoding), but it does not specify concrete hyperparameters like learning rates, batch sizes, or optimizer settings for these models.