Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Properties of Alternative Data for Fairer Credit Risk Predictions

Authors: Jung Youn Lee, Joonhyuk Yang

DMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By leveraging unique data on individuals credit card default behaviors and their purchase behaviors at a supermarket, we simulate a credit card issuer s credit scoring process. In the absence of supermarket data, the algorithm s predictive accuracy for women is about 2.3% lower than that for men. We then integrate data from each of the 410 product markets within the supermarket into the algorithm and measure the changes in the gender gap in predictive accuracy.
Researcher Affiliation	Academia	Jung Youn Lee EMAIL Jones Graduate School of Business Rice University Houston, TX 77005, USA Joonhyuk Yang EMAIL Mendoza College of Business University of Notre Dame Notre Dame, IN 46556, USA
Pseudocode	No	The paper describes the methodology in prose, detailing steps like data splitting, algorithm training, and evaluation. However, it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor does it present structured steps formatted like code or an algorithm.
Open Source Code	No	As our data is protected by a non-disclosure agreement with the data provider, we are unable to publicize the data.
Open Datasets	No	Our analysis benefits from a unique data set consisting of first-party data from two firms owned by a conglomerate: a credit card issuer and a large-scale supermarket chain.1 Using a customer identifier, we can merge data from both firms at the individual level, which allows us to build a hypothetical credit scoring algorithm that leverages supermarket transaction data.2 Another valuable aspect of our data set is that the conglomerate collects gender information from its members, although this information is not used for the issuer s credit decisions. This feature enables us to empirically explore the impact of using alternative data separately for men and women. This is a significant advantage, as empirical research on gender-based disparities in consumer credit markets has been limited. This limitation is largely due to the fact that gender is a protected class in many countries, including the United States, the United Kingdom, Canada, the European Union, Australia, and India. Consequently, lenders are often prohibited from collecting applicants gender information or linking it with credit data. 1. As our data is protected by a non-disclosure agreement with the data provider, we are unable to publicize the data. In Appendix A, we provide summary statistics for the features derived from the data.
Dataset Splits	Yes	To train the algorithms, we randomly split our final data set consisting of 30,005 consumers into a training set (70%) and a test set (30%). In doing so, we employ a stratified sampling approach to maintain an identical share of defaulters in both sets.
Hardware Specification	No	The paper describes the use of XGBoost for training models and discusses the iteration process, but it does not specify any particular hardware like GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	Yes	Then, we train a binary classifier on the training data using XGBoost (e Xtreme Gradient Boosting; Chen and Guestrin, 2016) to predict whether an individual will default, where its hyperparameters are tuned through a Bayesian optimization method with 10-fold cross-validation (Wilson, 2021). ... (Wilson, 2021) refers to 'Parbayesianoptimization: Parallel bayesian optimization of hyperparameters. R package version, 1(4):935, 2021.'
Experiment Setup	Yes	Then, we train a binary classifier on the training data using XGBoost (e Xtreme Gradient Boosting; Chen and Guestrin, 2016) to predict whether an individual will default, where its hyperparameters are tuned through a Bayesian optimization method with 10-fold cross-validation (Wilson, 2021). We also re-scale the gradient for the positive class (i.e., default) and overcorrect errors related to the class in order to mitigate the potential impact of class imbalance (i.e., having significantly more non-defaulters in the data than defaulters) on the predictive performance of our algorithms. The algorithm s out-of-sample predictive performance is evaluated on the test set. This entire process is iterated 1,500 times, each with different random training/test data splits and hyperparameter tuning.