Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Kernelized Stein Discrepancy Tests of Goodness-of-fit for Time-to-Event Data

Authors: Tamara Fernandez, Nicolas Rivera, Wenkai Xu, Arthur Gretton

ICML 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical studies and results are presented in Section 6, where we compare with a recent state-of-the-art non-parametric test for censored data by Fernandez & Gretton (2019) based on the MMD, which has been shown to outperform classical tests. Our experimental results show that our proposed methods perform better than existing tests, including previous tests based on a kernelized maximum mean discrepancy.
Researcher Affiliation	Academia	1Gatsby Computational Neuroscience Unit, University College London, United Kingdom 2Department of Computer Science and Technology, University of Cambridge, United Kingdom.
Pseudocode	No	No pseudocode or algorithm blocks are present in the paper.
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code related to the described methodology.
Open Datasets	Yes	aml: Acute Myelogenous Leukemia survival dataset (Miller Jr, 2011); cgd: Chronic Granulotamous Disease dataset (Fleming & Harrington, 2011); ovarian: Ovarian Cancer Survival dataset (Edmonson et al., 1979); lung: North Central Can-p-value aml cgd ovarian Exponential 0.585 0.460 0.681 Weibull: shape=2 0.001 0.002 0.063 Table 1. Real data applications on testing hazard proportionality. Dataset Covarites p-value lung Age 0.167 stanford T5 mismatch score 0.594 naﬂd Weight and Gender 0.108 Table 2. Real data applications on testing goodness of ﬁt cer Treatment Group (NCCTG) Lung Cancer dataset (Loprinzi et al., 1994); stanford: Stanford Heart Transplant Data (Crowley & Hu, 1977); naﬂd: Non-alcohol fatty liver disease (NAFLD) (Allen et al., 2018).
Dataset Splits	No	The paper mentions 'spliting the data into training set and test sets' but does not specify a validation set or detailed split percentages for reproduction.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers needed to replicate the experiment.
Experiment Setup	Yes	In all our experiments we choose the null as an exponential distribution of rate 1, and in this case we can check that s KSD and m KSD coincide. Additionally, we implement m KSDu, which is given by the test m KSD applied to the transformed data ((F0(Ti), i))n i=1 to test H0 : F0(X) U(0, 1). Finally, we use an Gaussian kernel with length-scale chosen by using the median-heuristic, which is the median of all the absolute differences between two different data points.