Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Functional Martingale Residual Process for High-Dimensional Cox Regression with Model Averaging

Authors: Baihua He, Yanyan Liu, Yuanshan Wu, Guosheng Yin, Xingqiu Zhao

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The performances of the proposed model averaging procedures are evaluated via extensive simulation studies, demonstrating that our methods achieve superior prediction accuracy over the existing regularization methods. As an illustration, we apply the proposed methods to the mantle cell lymphoma study.
Researcher Affiliation	Academia	Baihua He EMAIL Yanyan Liu EMAIL School of Mathematics and Statistics Wuhan University Wuhan 430072, China; Yuanshan Wu EMAIL School of Statistics and Mathematics Zhongnan University of Economics and Law Wuhan 430073, China; Guosheng Yin EMAIL Department of Statistics and Actuarial Science The University of Hong Kong Pokfulam Road, Hong Kong; Xingqiu Zhao EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University Hung Hom, Kowloon, Hong Kong
Pseudocode	Yes	Algorithm 1 Greedy model averaging algorithm based on the ECV criterion
Open Source Code	No	The paper does not explicitly state that source code is provided or offer a link to a repository for the described methodology.
Open Datasets	Yes	As an illustration, we apply the proposed model averaging approaches to the mantle cell lymphoma (MCL) study, which was also analyzed by Rosenwald et al. (2003). The gene expression data set available from http://llmpp.nih.gov/MCL/
Dataset Splits	Yes	The delete-one CV procedure, which is also called the n-fold CV, is advocated for the proposed model averaging methods. Nevertheless, our methods can be readily coupled with general ν-fold CV with ν < n. We consider ν = 5 and 10 to investigate the performances of the proposed methods.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models) used for running the experiments or simulations.
Software Dependencies	No	The paper mentions various regularization methods like LASSO, MCP, SCAD, Elastic Net, Ridge, and ALASSO, but does not specify any software names with version numbers.
Experiment Setup	Yes	We generate survival time Ti from the Cox proportional hazards model, λ(t\|Zi) = λ(t) exp(ZT i β), where the baseline hazard function is λ(t) = (t 0.5)2 and the high-dimensional predictor Zi = (Zi1, . . . , Zipn) follows a pn-dimensional normal distribution with mean 0 and covariance matrix Σ = (0.8\|j j \|) for j, j = 1, . . . , pn. The first 15 elements of β are set to be 0.2 and the rest 0. The censoring time is Ci = e Ci τ, where e Ci is generated from an exponential distribution, Exp(0.12), and the study duration τ is chosen to yield a censoring rate of 20%. We consider sample size n = 100 and 200, coupled with the dimension of predictors pn = 1000 and 2000. This leads to a total of Kn = 100 or 50 candidate models for pn = 1000 and Kn = 200 or 100 for pn = 2000. We evaluate the relative risk (RR) for a subject with predictors drawn from a pn-dimensional normal distribution with mean 0 and covariance matrix Σ, as well as the survival probability (SP) at time t0 = 2.