reproducibilityindex.ai

Are Logistic Models Really Interpretable?

Authors: Danial Dervovic, Freddy Lecue, Nicolas Marchesotti, Daniele Magazzeni

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show via a User Study that skilled participants are unable to reliably reproduce the action of small LR models given the trained parameters. As an antidote to this, we define Linearised Additive Models (LAMs), an optimal piecewise linear approximation that augments any trained additive model equipped with a sigmoid link function, requiring no retraining. We argue that LAMs are more interpretable than logistic models survey participants are shown to solve model reasoning tasks with LAMs much more accurately than with LR given the same information. Furthermore, we show that LAMs do not suffer from large performance penalties in terms of ROC-AUC and calibration with respect to their logistic counterparts on a broad suite of public financial modelling data. 3 Performance Comparison 4 User Survey
Researcher Affiliation	Industry	1JP Morgan AI Research, Edinburgh, UK 2JP Morgan AI Research, New York City, NY, USA 3JP Morgan AI Research, London, UK
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets	Yes	We consider publically available datasets from the UCI repository [Kelly et al.], namely, the German Credit dataset [Hofmann, 1994], Australia credit approvals [Quinlan, 1987], Taiwanese bankruptcy [Liang et al., 2020] prediction, Japanese credit screening [Sano, 1992] and the Polish companies bankruptcy [Tomczak, 2016] dataset. We consider also the FICO Home Equity Line of Credit dataset (HELOC) [FICO, 2018], Give Me Some Credit (GMSC) and Lending Club (LC) [Kaggle, 2019] datasets.
Dataset Splits	Yes	For each metric the 10-fold stratified cross-validation score is computed for every (classifier, dataset) combination.
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies	No	The paper mentions using "XGBoost" and other models but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, scikit-learn, or XGBoost's specific version).
Experiment Setup	No	The paper describes the models and datasets used but does not provide specific details about the experimental setup such as hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or optimizer configurations.