Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Optimal Predictive Checklists

Authors: Haoran Zhang, Quaid Morris, Berk Ustun, Marzyeh Ghassemi

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we present a method to learn checklists for clinical decision support. We benchmark the performance of our method on seven clinical classification problems, and demonstrate its practical benefits by training a short-form checklist for PTSD screening. Our results show that our method can fit simple predictive checklists that perform well and that can easily be customized to obey a rich class of custom constraints.
Researcher Affiliation Academia Haoran Zhang Massachusetts Institute of Technology EMAIL Quaïd Morris Memorial Sloan Kettering Cancer Center EMAIL Berk Ustun* UC San Diego EMAIL Maryzeh Ghassemi* Massachusetts Institute of Technology EMAIL
Pseudocode Yes Algorithm 1: Sequential Training over N and M
Open Source Code Yes We provide a Python package to train and customize predictive checklists with open-source and commercial solvers, including CBC [29] and CPLEX [20] (see https://github.com/ MLfor Health/predictive_checklists).
Open Datasets Yes We consider seven clinical classification tasks shown in Table 2. For each task, we create a classification dataset by using each of the following techniques to binarize features: Fixed: We convert continuous and ordinal features into threshold indicator variables using the median, and convert categorical feature into an indicator for the most common category. Adaptive: We convert continuous and ordinal features into 4 threshold indicator variables using quintiles as thresholds, and convert each categorical feature with a one-hot encoding. Optbinning: We use the method proposed by Navas-Palencia [53] to binarize all features. We process each dataset to oversample the minority class to equalize the number of positive and negative examples due to class imbalance.
Dataset Splits Yes We use 5-fold cross validation and report the mean, minimum, and maximum test error across the five folds. We use the training set of each fold to fit a predictive checklist that contains at most N 8 items, and that is required to select at most 1 item from a feature group.
Hardware Specification Yes We solve this problem using CPLEX 12.10 [20] paired with the computational improvements in Section 4 on a 2.4 GHz CPU with 16 GB RAM for 60 minutes.
Software Dependencies Yes We provide a Python package to train and customize predictive checklists with open-source and commercial solvers, including CBC [29] and CPLEX [20] (see https://github.com/ MLfor Health/predictive_checklists).
Experiment Setup No The paper specifies the use of 5-fold cross-validation and a time limit for the solver ('for 60 minutes'), but it does not provide concrete hyperparameter values or detailed training configurations (e.g., specific learning rates, batch sizes, or optimization algorithm settings) for the models.