Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn
Authors: Sebastian Pölsterl
JMLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we present scikit-survival, a Python library for time-to-event analysis. It provides a tight integration with scikit-learn (Pedregosa et al., 2011), such that prepreprocessing and feature selection techniques within scikit-learn can be seamlessly combined with a model from scikit-survival. It provides efficient implementations of linear models, ensemble models, and survival support vector machines, as well as a range of evaluation metrics suitable for right censored time-to-event data. |
| Researcher Affiliation | Academia | Sebastian Pölsterl EMAIL Artificial Intelligence in Medical Imaging (AI-Med), Department of Child and Adolescent Psychiatry, Ludwig-Maximilians-Universität, Munich, Germany |
| Pseudocode | Yes | Figure 1: Example of using scikit-survival (left) and its output (right). 1 import numpy as np 2 import matplotlib.pyplot as plt 3 from sklearn.model_selection import train_test_split 4 from sklearn.pipeline import make_pipeline 5 from sksurv.datasets import load_whas500 6 from sksurv.linear_model import CoxPHSurvivalAnalysis 7 from sksurv.preprocessing import OneHotEncoder 9 # load example data 10 data_x, data_y = load_whas500() 11 # split the data 12 X_train, X_test, y_train, y_test = train_test_split( 13 data_x, data_y, test_size=50, random_state=2020) 15 # combine feature transform and Cox model 16 pipeline = make_pipeline( 17 OneHotEncoder(), CoxPHSurvivalAnalysis()) 18 # fit the model 19 pipeline.fit(X_train, y_train) 20 # compute concordance index on held-out data 21 c_index = pipeline.score(X_test, y_test) 23 # plot estimated survival functions 24 surv_fns = pipeline.predict_survival_function(X_test) 25 time_points = np.arange(1, 1000) 26 for surv_func in surv_fns: 27 plt.step(time_points, surv_func(time_points), 28 where="post") 29 plt.ylabel("probability of survival $\hat{S}(t)$") 30 plt.xlabel("time $t$") 31 plt.title("concordance index = %.3f" % c_index) 32 plt.show() |
| Open Source Code | Yes | scikit-survival is distributed under the GPL-3 license with the source code and detailed instructions available at https://github.com/sebp/scikit-survival |
| Open Datasets | Yes | 9 # load example data 10 data_x, data_y = load_whas500() |
| Dataset Splits | Yes | 12 X_train, X_test, y_train, y_test = train_test_split( 13 data_x, data_y, test_size=50, random_state=2020) |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper mentions "Python 3.6 and above", "scikit-learn", "numpy", and "matplotlib". While Python has a version range, specific versions for scikit-learn, numpy, and matplotlib are not provided. |
| Experiment Setup | Yes | 12 X_train, X_test, y_train, y_test = train_test_split( 13 data_x, data_y, test_size=50, random_state=2020) 15 # combine feature transform and Cox model 16 pipeline = make_pipeline( 17 OneHotEncoder(), CoxPHSurvivalAnalysis()) |