Conformalized Survival Distributions: A Generic Post-Process to Increase Calibration
Authors: Shi-Ang Qi, Yakun Yu, Russell Greiner
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide theoretical guarantees for the above claim, and rigorously validate the efficiency of our approach across 11 real-world datasets, showcasing its practical applicability and robustness in diverse scenarios. Section 4 presents the extensive empirical analysis across 11 real-world survival datasets and shows the effectiveness of CSD. |
| Researcher Affiliation | Academia | 1Computing Science, University of Alberta, Edmonton, Canada 2Eletrical Computer Engineeering, University of Alberta, Edmonton, Canada 3Alberta Machine Intelligence Institute, Edmonton, Canada. |
| Pseudocode | Yes | The pseudo-algorithm for computing the CSD with the KM-sampling process is presented at Algorithm 1 in Appendix D.2. |
| Open Source Code | Yes | A Python implementation of CSD is available online at https://github.com/shi-ang/CSD, along with code to replicate all experiments. |
| Open Datasets | Yes | The Veterans Administration Lung Cancer Trial (VALCT) dataset (Kalbfleisch & Prentice, 2011) is derived from a randomized trial comparing two treatment regimens for lung cancer... The Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset (Curtis et al., 2012) contains survival information for breast cancer patients... Surveillance, Epidemiology, and End Results (SEER) Program dataset (National Cancer Institute, DCCPS, Surveillance Research Program, 2015) is a comprehensive collection of data on cancer patients in the United States. |
| Dataset Splits | Yes | We split each dataset into a training set (90%) and a testing set (10%) using a stratified splitting procedure that balances both the time ti and the censor indicator δi. For algorithms that require a validation set to tune hyperparameters or to early stop, we partition another balanced 10% validation set from the training set. |
| Hardware Specification | No | The paper describes the software models and experimental setup, but it does not specify any particular hardware components such as CPU or GPU models, or detailed cloud computing specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions several software packages like "lifelines packages (Davidson Pilon, 2024)", "scikit-survival packages (P olsterl, 2020)", "torchmtlr", and "pycox packages (Kvamme et al., 2019)", but does not provide explicit version numbers for these libraries or the programming language (Python). |
| Experiment Setup | Yes | For the training process, we utilize Adam optimizer combined with an L2 penalty for weight decay to fine-tune the models. The learning parameters are set as follows: a learning rate of 0.001, a batch size of 256, and a dropout rate of 0.4. Additionally, we implement an early stopping mechanism across all deep learning models, which is based on performance validation using a separate validation dataset. |