Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn

Authors: Sebastian Pölsterl

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we present scikit-survival, a Python library for time-to-event analysis. It provides a tight integration with scikit-learn (Pedregosa et al., 2011), such that prepreprocessing and feature selection techniques within scikit-learn can be seamlessly combined with a model from scikit-survival. It provides efficient implementations of linear models, ensemble models, and survival support vector machines, as well as a range of evaluation metrics suitable for right censored time-to-event data.
Researcher Affiliation	Academia	Sebastian Pölsterl EMAIL Artificial Intelligence in Medical Imaging (AI-Med), Department of Child and Adolescent Psychiatry, Ludwig-Maximilians-Universität, Munich, Germany
Pseudocode	Yes	Figure 1: Example of using scikit-survival (left) and its output (right). 1 import numpy as np 2 import matplotlib.pyplot as plt 3 from sklearn.model_selection import train_test_split 4 from sklearn.pipeline import make_pipeline 5 from sksurv.datasets import load_whas500 6 from sksurv.linear_model import CoxPHSurvivalAnalysis 7 from sksurv.preprocessing import OneHotEncoder 9 # load example data 10 data_x, data_y = load_whas500() 11 # split the data 12 X_train, X_test, y_train, y_test = train_test_split( 13 data_x, data_y, test_size=50, random_state=2020) 15 # combine feature transform and Cox model 16 pipeline = make_pipeline( 17 OneHotEncoder(), CoxPHSurvivalAnalysis()) 18 # fit the model 19 pipeline.fit(X_train, y_train) 20 # compute concordance index on held-out data 21 c_index = pipeline.score(X_test, y_test) 23 # plot estimated survival functions 24 surv_fns = pipeline.predict_survival_function(X_test) 25 time_points = np.arange(1, 1000) 26 for surv_func in surv_fns: 27 plt.step(time_points, surv_func(time_points), 28 where="post") 29 plt.ylabel("probability of survival $\hat{S}(t)$") 30 plt.xlabel("time $t$") 31 plt.title("concordance index = %.3f" % c_index) 32 plt.show()
Open Source Code	Yes	scikit-survival is distributed under the GPL-3 license with the source code and detailed instructions available at https://github.com/sebp/scikit-survival
Open Datasets	Yes	9 # load example data 10 data_x, data_y = load_whas500()
Dataset Splits	Yes	12 X_train, X_test, y_train, y_test = train_test_split( 13 data_x, data_y, test_size=50, random_state=2020)
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	The paper mentions "Python 3.6 and above", "scikit-learn", "numpy", and "matplotlib". While Python has a version range, specific versions for scikit-learn, numpy, and matplotlib are not provided.
Experiment Setup	Yes	12 X_train, X_test, y_train, y_test = train_test_split( 13 data_x, data_y, test_size=50, random_state=2020) 15 # combine feature transform and Cox model 16 pipeline = make_pipeline( 17 OneHotEncoder(), CoxPHSurvivalAnalysis())