Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Confidence Intervals and Hypothesis Testing for High-Dimensional Regression
Authors: Adel Javanmard, Andrea Montanari
JMLR 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our method on synthetic data and a high-throughput genomic data set about riboflavin production rate, made publicly available by B uhlmann et al. (2014). Keywords: hypothesis testing, confidence intervals, LASSO, high-dimensional models, bias of an estimator. Section 5 illustrates the above results through numerical simulations both on synthetic and on real data. |
| Researcher Affiliation | Academia | Adel Javanmard EMAIL Department of Electrical Engineering Stanford University Stanford, CA 94305, USA. Andrea Montanari EMAIL Department of Electrical Engineering and Department of Statistics Stanford University Stanford, CA 94305, USA |
| Pseudocode | Yes | Algorithm 1 Unbiased estimator for θ0 in high-dimensional linear regression models. Input: Measurement vector y, design matrix X, parameters λ, µ. Output: Unbiased estimator bθu. |
| Open Source Code | Yes | In the interest of reproducibility, an R implementation of our algorithm is available at http://www.stanford.edu/~montanar/sslasso/. |
| Open Datasets | Yes | We test our method on synthetic data and a high-throughput genomic data set about riboflavin production rate, made publicly available by B uhlmann et al. (2014). As a real data example, we consider a high-throughput genomic data set concerning riboflavin (vitamin B2) production rate. This data set is made publicly available by B uhlmann et al. (2014) |
| Dataset Splits | No | The paper uses synthetic data which is generated, and a real genomic dataset (riboflavin example) with n=71 samples and p=4,088 covariates. However, it does not explicitly provide details about how these datasets were split into training, validation, or test sets for the experiments presented in the paper. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions an "R implementation of our algorithm", and refers to "R-package hdi" and "R package glmnet (Friedman et al., 2010)". However, it does not specify version numbers for R itself or for the mentioned R packages. |
| Experiment Setup | Yes | We use the regularization parameter λ = 4bσ p(2 log p)/n, where bσ is given by the scaled LASSO as per equation (31) with eλ = 10 p(2 log p)/n. Furthermore, parameter µ (cf. Equation 4) is set to µ = 2.5 p(log p)/n. This choice of µ is guided by Theorem 7 (b). Throughout, we set the significance level α = 0.05. |