Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Functional Martingale Residual Process for High-Dimensional Cox Regression with Model Averaging
Authors: Baihua He, Yanyan Liu, Yuanshan Wu, Guosheng Yin, Xingqiu Zhao
JMLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The performances of the proposed model averaging procedures are evaluated via extensive simulation studies, demonstrating that our methods achieve superior prediction accuracy over the existing regularization methods. As an illustration, we apply the proposed methods to the mantle cell lymphoma study. |
| Researcher Affiliation | Academia | Baihua He EMAIL Yanyan Liu EMAIL School of Mathematics and Statistics Wuhan University Wuhan 430072, China; Yuanshan Wu EMAIL School of Statistics and Mathematics Zhongnan University of Economics and Law Wuhan 430073, China; Guosheng Yin EMAIL Department of Statistics and Actuarial Science The University of Hong Kong Pokfulam Road, Hong Kong; Xingqiu Zhao EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University Hung Hom, Kowloon, Hong Kong |
| Pseudocode | Yes | Algorithm 1 Greedy model averaging algorithm based on the ECV criterion |
| Open Source Code | No | The paper does not explicitly state that source code is provided or offer a link to a repository for the described methodology. |
| Open Datasets | Yes | As an illustration, we apply the proposed model averaging approaches to the mantle cell lymphoma (MCL) study, which was also analyzed by Rosenwald et al. (2003). The gene expression data set available from http://llmpp.nih.gov/MCL/ |
| Dataset Splits | Yes | The delete-one CV procedure, which is also called the n-fold CV, is advocated for the proposed model averaging methods. Nevertheless, our methods can be readily coupled with general ν-fold CV with ν < n. We consider ν = 5 and 10 to investigate the performances of the proposed methods. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models) used for running the experiments or simulations. |
| Software Dependencies | No | The paper mentions various regularization methods like LASSO, MCP, SCAD, Elastic Net, Ridge, and ALASSO, but does not specify any software names with version numbers. |
| Experiment Setup | Yes | We generate survival time Ti from the Cox proportional hazards model, λ(t|Zi) = λ(t) exp(ZT i β), where the baseline hazard function is λ(t) = (t 0.5)2 and the high-dimensional predictor Zi = (Zi1, . . . , Zipn) follows a pn-dimensional normal distribution with mean 0 and covariance matrix Σ = (0.8|j j |) for j, j = 1, . . . , pn. The first 15 elements of β are set to be 0.2 and the rest 0. The censoring time is Ci = e Ci τ, where e Ci is generated from an exponential distribution, Exp(0.12), and the study duration τ is chosen to yield a censoring rate of 20%. We consider sample size n = 100 and 200, coupled with the dimension of predictors pn = 1000 and 2000. This leads to a total of Kn = 100 or 50 candidate models for pn = 1000 and Kn = 200 or 100 for pn = 2000. We evaluate the relative risk (RR) for a subject with predictors drawn from a pn-dimensional normal distribution with mean 0 and covariance matrix Σ, as well as the survival probability (SP) at time t0 = 2. |