An Effective Meaningful Way to Evaluate Survival Models
Authors: Shi-Ang Qi, Neeraj Kumar, Mahtab Farrokh, Weijie Sun, Li-Hao Kuan, Rajesh Ranganath, Ricardo Henao, Russell Greiner
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted extensive experiments to evaluate the effectiveness of the proposed evaluation methods comparing the effectiveness of these 6 evaluation metrics for estimating the actual MAE of various survival models on a wide range of survival datasets. |
| Researcher Affiliation | Academia | 1Computing Science, University of Alberta, Edmonton, Canada 2Alberta Machine Intelligence Institute, Edmonton, Canada 3Computer Science & Center for Data Science, New York University, New York City, USA 4Biostatistics & Bioinformatics, Duke University, Durham, USA. |
| Pseudocode | No | No explicit pseudocode or algorithm block found. |
| Open Source Code | Yes | We also provide a code base for these MAE approaches, for this and other variants. ... Code to replicate all experiments can be found at https://github.com/shi-ang/Censored MAE |
| Open Datasets | Yes | We apply this synthetic censoring to 5 real-world datasets: GBM, SUPPORT, METABRIC, MIMIC-IV (Johnson et al., 2022) all-cause mortality datasets (MIMIC-A) and MIMIC-IV hospital mortality datasets (MIMIC-H). Table 1 summarizes the characteristics of these five datasets, and Appendix E.1 contains information on the dataset preprocessing and MIMIC-IV datasets construction. ... GBM is retrieved from The Cancer Genome Atlas (TCGA) dataset (Weinstein et al., 2013). ... The data from TCGA can be found on http://firebrowse.org/ or by the instruction in Haider et al. (2020). ... The Study to Understand Prognoses Preferences Outcomes and Risks of Treatment (SUPPORT) dataset (Knaus et al., 1995)... The official website (https://biostat.app.vumc.org/wiki/Main/Support Desc) for the SUPPORT dataset provides a guideline for imputing baseline physiologic features... The Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) (Curtis et al., 2012)... The dataset can be downloaded from (https://github.com/havakv/pycox), and it does not have any missing values. ... The Medical Information Mart for Intensive Care (MIMIC)-IV (Johnson et al., 2022) dataset is an update to MIMIC-III... |
| Dataset Splits | Yes | We split the data into a training set (80%) and a test set (20%) using a stratified 5-fold cross-validation (5CV) procedure (stratified wrt both time t and censor indicator δ). If the model requires a validation set for hyper-parameter tuning or early stopping, we will split 20% of the training set as the validation set. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for experiments are mentioned in the paper. |
| Software Dependencies | No | The paper mentions software packages like 'scikit-learn', 'lifelines', 'scikit-survival', and 'pycox', but does not provide specific version numbers for them. |
| Experiment Setup | Yes | We use the 100 boosting stages with partial likelihood loss for optimization. ... We use 50 trees with 3 minimal samples per leaf to fit the model. ... The number of discrete times is determined by the square root of numbers of uncensored patients, and use quantiles to divide those uncensored instances evenly into each time interval... The time interval is uniformly split from time zero to the last observed time... We will use a three-hidden-layer structure with dimensions of [50, 50, 50]. We set the number of components to 25, kept the probability for weights equal to 0.8, and set the sample size to 200. Early stopping is also performed with at least 10000 epochs for guaranteed improvement. ... The model architecture in the experiment has one hidden layer with a size of 15. The number of components is set to 15, and use residual as the initial type. The model is optimized via RMSprop optimizer with early stopping. |