Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Optimized Risk Scores

Authors: Berk Ustun, Cynthia Rudin

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We benchmark the performance of diﬀerent methods to learn risk scores on publicly available datasets, comparing risk scores produced by our method to risk scores built using methods that are used in practice. We also discuss the practical beneﬁts of our method through a real-world application where we build a customized risk score for ICU seizure prediction in collaboration with the Massachusetts General Hospital.
Researcher Affiliation	Academia	Berk Ustun EMAIL Center for Research in Computation and Society Harvard University Cynthia Rudin EMAIL Department of Computer Science Department of Electrical and Computer Engineering Department of Statistical Science Duke University
Pseudocode	Yes	In Algorithm 1, we present a simple cutting plane algorithm to solve Risk Slim MINLP that we call CPA. To avoid stalling in non-convex settings, we solve the risk score problem using the lattice cutting plane algorithm (LCPA) shown in Algorithm 2. Discrete coordinate descent (DCD) is a technique to polish an integer solution (Algorithm 3). Sequential Rounding (Algorithm 4) is a rounding heuristic to generate integer solutions for the risk score problem. In Algorithm 7, we present an initialization procedure for LCPA.
Open Source Code	Yes	We provide a software package to build optimized risk scores in Python, available online at http://github.com/ustunb/risk-slim.
Open Datasets	Yes	We considered 6 publicly available datasets shown in Table 2. ... All datasets are available on the UCI repository (Bache and Lichman, 2013), other than rearrest which must be requested from ICPSR. We processed each dataset by dropping examples with missing values, and by binarizing categorical variables and some real-valued variables. We provide processed datasets and the code to process rearrest at http://github.com/ustunb/risk-slim.
Dataset Splits	Yes	We use nested 5-fold cross-validation (5-CV) to choose the free parameters of a ﬁnal risk score (see Cawley and Talbot, 2010).
Hardware Specification	Yes	We solved each instance for at most 20 minutes on a 3.33 GHz CPU with 16 GB RAM using CPLEX 12.6.3 (ILOG, 2017).
Software Dependencies	Yes	We solved each instance for at most 20 minutes on a 3.33 GHz CPU with 16 GB RAM using CPLEX 12.6.3 (ILOG, 2017). We train PLR models using the glmnet package of Friedman et al. (2010). All MINLP algorithms were implemented in a state-of-the-art commercial solver (i.e., Artelsys Knitro 9.0, which is an updated version of the solver described in Byrd et al. 2006).
Experiment Setup	Yes	We formulated an instance of Risk Slim MINLP with the constraints: λ0 { −100, . . . , 100}, λj { −5, . . . , 5}, and λ 0 Rmax. We set the trade-oﬀparameter to a small value C0 = 10 −6 to recover the sparsest model among equally accurate models (see Appendix B).