reproducibilityindex.ai

Improving Decision Sparsity

Authors: Yiyang Sun, Tong Wang, Cynthia Rudin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate whether our proposed methods would achieve sparser, more credible and closer explanations, we present experiments on seven datasets: (i) UCI Adult Income dataset for predicting income levels [Dua and Graff, 2017], (ii) FICO Home Equity Line of Credit Dataset for assessing credit risk, used for the Explainable Machine Learning Challenge [FICO, 2018], (iii) UCI German Credit dataset for determining creditworthiness [Dua and Graff, 2017], (iv) MIMIC-III dataset for predicting patient outcomes in intensive care units [Johnson et al., 2016a,b], (v) COMPAS dataset [Jeff Larson and Angwin, 2016, Wang et al., 2022a] for predicting recidivism, (vi) Diabetes dataset [Strack et al., 2014] for predicting whether patients will be re-admitted within two years, and (vii) Headline dataset for predicting whether the headline is likely to be shared by readers [Chen et al., 2023].
Researcher Affiliation	Academia	Yiyang Sun Duke University Tong Wang Yale University Cynthia Rudin Duke University
Pseudocode	Yes	Algorithm 1 Reference Search for Flexible SEV (Appendix D). Algorithm 2 Preprocessing Information collection process for SEVT (Appendix E). Algorithm 3 Efficient SEVT Calculation Negative Pathways Check (Appendix E).
Open Source Code	Yes	Yes, we have provided the code for training, and evaluation in the Experiment folder, and the script for running in Script folder.
Open Datasets	Yes	UCI Adult Income dataset for predicting income levels [Dua and Graff, 2017], FICO Home Equity Line of Credit Dataset for assessing credit risk [FICO, 2018], MIMIC-III dataset for predicting patient outcomes in intensive care units [Johnson et al., 2016a,b], COMPAS dataset [Jeff Larson and Angwin, 2016, Wang et al., 2022a], Diabetes dataset [Strack et al., 2014], and Headline dataset [Chen et al., 2023].
Dataset Splits	No	The datasets were divided into training and test sets using an 80-20 stratification. The paper specifies train and test splits but does not explicitly provide percentages or counts for a separate validation split.
Hardware Specification	Yes	All the models are trained using a RTX2080Ti GPU, and with 4 core in Intel(R) Xeon(R) Gold 6226 CPU @ 2.70GHz.
Software Dependencies	No	Baseline models were fit using sklearn [Pedregosa et al., 2011] implementations in Python. The resulting loss was minimized via gradient descent in Py Torch [Paszke et al., 2019]. (This mentions software packages but does not provide specific version numbers for them.)
Experiment Setup	Yes	The 2-layer MLP used ReLU activation and consisted of two fully-connected layers with 128 nodes each. It was trained with early stopping. The gradient-boosted classifier used 200 trees with a max depth of 3. The resulting loss was minimized via gradient descent in Py Torch [Paszke et al., 2019], with a batch size of 128, a learning rate of 0.1, and the Adam optimizer. The first 80 training epochs are warm-up epochs optimizing just Binary Cross Entropy Loss for classification (BCELoss). The next 20 epochs add the All-Opt terms.