Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Inference for the Case Probability in High-dimensional Logistic Regression
Authors: Zijian Guo, Prabrisha Rakshit, Daniel S. Herman, Jinbo Chen
JMLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the proposed method via extensive simulation studies and application to real-world electronic health record data. |
| Researcher Affiliation | Academia | Zijian Guo EMAIL Prabrisha Rakshit EMAIL Department of Statistics Rutgers University Piscataway, New Jersey, USA Daniel S. Herman EMAIL Department of Pathology and Laboratory Medicine University of Pennsylvania Philadelphia, Pennsylvania, USA Jinbo Chen EMAIL Department of Pathology and Laboratory Medicine University of Pennsylvania Philadelphia, Pennsylvania, USA |
| Pseudocode | Yes | We provide details on how to implement the Li VE estimator deļ¬ned in (7). The initial estimator bβ deļ¬ned in (3) is computed using the R-package cv.glmnet (Friedman et al., 2010) with the tuning parameter Ī» chosen by cross-validation. To compute the projection direction bu Rp, we implement the following constrained optimization, bu = arg min u Rp u bĪ£u subject to bĪ£u x x 2Ī»n, |x bĪ£u x 2 2| x 2 2Ī»n. (27) This construction does not include the constraint (11), which is mainly imposed to facilitating the theoretical proof. We have conducted an additional check in simulations and observed that our constructed bu in (27) satisļ¬es Xbu C log n x 2; see Section C.2 in the supplementary material for details. We solve the dual problem of (27), bv = arg min v Rp+1 1 4v H bĪ£Hv + b Hv + Ī»n v 1 with H = [b, Ip p] , b = 1 x 2 x (28) and then solve the primal problem (27) as bu = (bv 1 + bv1b) /2. We refer to Proposition 2 in Cai et al. (2019) for the the detailed derivation of the dual problem (28). In this dual problem, when bĪ£ is singular and the tuning parameter Ī»n > 0 gets suļ¬ciently close to 0, the dual problem cannot be solved as the minimum value converges to negative inļ¬nity. Hence, we choose the smallest Ī»n > 0 such that the dual problem has a ļ¬nite minimum value. The tuning parameter Ī»n selected in this manner is at the scale of p log p/n. We investigate the ratio Ī»n/ p log p/n in Section C.1 in the supplement. |
| Open Source Code | Yes | Our proposed Li VE estimator has been implemented in the R package SIHR, which is available from CRAN. |
| Open Datasets | No | We demonstrate the proposed method using Penn Medicine EHR data to identify patients with hypertension and two subsets thereof that should be screened for PA, per specialty guidelines. The data were extracted from the Penn Medicine clinical data repository, including demographics, laboratory results, medication prescriptions, vital signs, and encounter meta information. The paper does not provide concrete access information (link, DOI, repository, or formal citation for public access) for the Penn Medicine EHR data. |
| Dataset Splits | Yes | In our analysis, we randomly sampled 30 patients as the test sample, then their predictor vectors were treated as x . A prediction model for each outcome variable was developed using the remaining 318 patients and then applied to the test sample to obtain bias-corrected estimates of the case probabilities using our method. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions several R packages (cv.glmnet, hdi, SIHR) and algorithms (WLDP) but does not provide specific version numbers for these software components or for R itself. |
| Experiment Setup | Yes | The initial estimator bβ deļ¬ned in (3) is computed using the R-package cv.glmnet (Friedman et al., 2010) with the tuning parameter Ī» chosen by cross-validation. To compute the projection direction bu Rp, we implement the following constrained optimization, bu = arg min u Rp u bĪ£u subject to bĪ£u x x 2Ī»n, |x bĪ£u x 2 2| x 2 2Ī»n. (27) ... Hence, we choose the smallest Ī»n > 0 such that the dual problem has a ļ¬nite minimum value. The tuning parameter Ī»n selected in this manner is at the scale of p log p/n. We investigate the ratio Ī»n/ p log p/n in Section C.1 in the supplement. We set p = 501, Ī£ = {0.51+|j l|}1 j l (p 1) and vary n {200, 400, 600}. |