Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Inference for the Case Probability in High-dimensional Logistic Regression

Authors: Zijian Guo, Prabrisha Rakshit, Daniel S. Herman, Jinbo Chen

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the proposed method via extensive simulation studies and application to real-world electronic health record data.
Researcher Affiliation	Academia	Zijian Guo EMAIL Prabrisha Rakshit EMAIL Department of Statistics Rutgers University Piscataway, New Jersey, USA Daniel S. Herman EMAIL Department of Pathology and Laboratory Medicine University of Pennsylvania Philadelphia, Pennsylvania, USA Jinbo Chen EMAIL Department of Pathology and Laboratory Medicine University of Pennsylvania Philadelphia, Pennsylvania, USA
Pseudocode	Yes	We provide details on how to implement the Li VE estimator deﬁned in (7). The initial estimator bβ deﬁned in (3) is computed using the R-package cv.glmnet (Friedman et al., 2010) with the tuning parameter λ chosen by cross-validation. To compute the projection direction bu Rp, we implement the following constrained optimization, bu = arg min u Rp u bΣu subject to bΣu x x 2λn, \|x bΣu x 2 2\| x 2 2λn. (27) This construction does not include the constraint (11), which is mainly imposed to facilitating the theoretical proof. We have conducted an additional check in simulations and observed that our constructed bu in (27) satisﬁes Xbu C log n x 2; see Section C.2 in the supplementary material for details. We solve the dual problem of (27), bv = arg min v Rp+1 1 4v H bΣHv + b Hv + λn v 1 with H = [b, Ip p] , b = 1 x 2 x (28) and then solve the primal problem (27) as bu = (bv 1 + bv1b) /2. We refer to Proposition 2 in Cai et al. (2019) for the the detailed derivation of the dual problem (28). In this dual problem, when bΣ is singular and the tuning parameter λn > 0 gets suﬃciently close to 0, the dual problem cannot be solved as the minimum value converges to negative inﬁnity. Hence, we choose the smallest λn > 0 such that the dual problem has a ﬁnite minimum value. The tuning parameter λn selected in this manner is at the scale of p log p/n. We investigate the ratio λn/ p log p/n in Section C.1 in the supplement.
Open Source Code	Yes	Our proposed Li VE estimator has been implemented in the R package SIHR, which is available from CRAN.
Open Datasets	No	We demonstrate the proposed method using Penn Medicine EHR data to identify patients with hypertension and two subsets thereof that should be screened for PA, per specialty guidelines. The data were extracted from the Penn Medicine clinical data repository, including demographics, laboratory results, medication prescriptions, vital signs, and encounter meta information. The paper does not provide concrete access information (link, DOI, repository, or formal citation for public access) for the Penn Medicine EHR data.
Dataset Splits	Yes	In our analysis, we randomly sampled 30 patients as the test sample, then their predictor vectors were treated as x . A prediction model for each outcome variable was developed using the remaining 318 patients and then applied to the test sample to obtain bias-corrected estimates of the case probabilities using our method.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions several R packages (cv.glmnet, hdi, SIHR) and algorithms (WLDP) but does not provide specific version numbers for these software components or for R itself.
Experiment Setup	Yes	The initial estimator bβ deﬁned in (3) is computed using the R-package cv.glmnet (Friedman et al., 2010) with the tuning parameter λ chosen by cross-validation. To compute the projection direction bu Rp, we implement the following constrained optimization, bu = arg min u Rp u bΣu subject to bΣu x x 2λn, \|x bΣu x 2 2\| x 2 2λn. (27) ... Hence, we choose the smallest λn > 0 such that the dual problem has a ﬁnite minimum value. The tuning parameter λn selected in this manner is at the scale of p log p/n. We investigate the ratio λn/ p log p/n in Section C.1 in the supplement. We set p = 501, Σ = {0.51+\|j l\|}1 j l (p 1) and vary n {200, 400, 600}.