Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn
Authors: Sebastian Pölsterl
JMLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we present scikit-survival, a Python library for time-to-event analysis. It provides a tight integration with scikit-learn (Pedregosa et al., 2011), such that prepreprocessing and feature selection techniques within scikit-learn can be seamlessly combined with a model from scikit-survival. It provides efficient implementations of linear models, ensemble models, and survival support vector machines, as well as a range of evaluation metrics suitable for right censored time-to-event data. |
| Researcher Affiliation | Academia | Sebastian Pölsterl EMAIL Artificial Intelligence in Medical Imaging (AI-Med), Department of Child and Adolescent Psychiatry, Ludwig-Maximilians-Universität, Munich, Germany |
| Pseudocode | Yes | Figure 1: Example of using scikit-survival (left) and its output (right). 1 import numpy as np 2 import matplotlib.pyplot as plt 3 from sklearn.model_selection import train_test_split 4 from sklearn.pipeline import make_pipeline 5 from sksurv.datasets import load_whas500 6 from sksurv.linear_model import CoxPHSurvivalAnalysis 7 from sksurv.preprocessing import OneHotEncoder 9 # load example data 10 data_x, data_y = load_whas500() 11 # split the data 12 X_train, X_test, y_train, y_test = train_test_split( 13 data_x, data_y, test_size=50, random_state=2020) 15 # combine feature transform and Cox model 16 pipeline = make_pipeline( 17 OneHotEncoder(), CoxPHSurvivalAnalysis()) 18 # fit the model 19 pipeline.fit(X_train, y_train) 20 # compute concordance index on held-out data 21 c_index = pipeline.score(X_test, y_test) 23 # plot estimated survival functions 24 surv_fns = pipeline.predict_survival_function(X_test) 25 time_points = np.arange(1, 1000) 26 for surv_func in surv_fns: 27 plt.step(time_points, surv_func(time_points), 28 where="post") 29 plt.ylabel("probability of survival $\hat{S}(t)$") 30 plt.xlabel("time $t$") 31 plt.title("concordance index = %.3f" % c_index) 32 plt.show() |
| Open Source Code | Yes | scikit-survival is distributed under the GPL-3 license with the source code and detailed instructions available at https://github.com/sebp/scikit-survival |
| Open Datasets | Yes | 9 # load example data 10 data_x, data_y = load_whas500() |
| Dataset Splits | Yes | 12 X_train, X_test, y_train, y_test = train_test_split( 13 data_x, data_y, test_size=50, random_state=2020) |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper mentions "Python 3.6 and above", "scikit-learn", "numpy", and "matplotlib". While Python has a version range, specific versions for scikit-learn, numpy, and matplotlib are not provided. |
| Experiment Setup | Yes | 12 X_train, X_test, y_train, y_test = train_test_split( 13 data_x, data_y, test_size=50, random_state=2020) 15 # combine feature transform and Cox model 16 pipeline = make_pipeline( 17 OneHotEncoder(), CoxPHSurvivalAnalysis()) |