Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Confidence Intervals and Hypothesis Testing for High-Dimensional Regression

Authors: Adel Javanmard, Andrea Montanari

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our method on synthetic data and a high-throughput genomic data set about riboﬂavin production rate, made publicly available by B uhlmann et al. (2014). Keywords: hypothesis testing, conﬁdence intervals, LASSO, high-dimensional models, bias of an estimator. Section 5 illustrates the above results through numerical simulations both on synthetic and on real data.
Researcher Affiliation	Academia	Adel Javanmard EMAIL Department of Electrical Engineering Stanford University Stanford, CA 94305, USA. Andrea Montanari EMAIL Department of Electrical Engineering and Department of Statistics Stanford University Stanford, CA 94305, USA
Pseudocode	Yes	Algorithm 1 Unbiased estimator for θ0 in high-dimensional linear regression models. Input: Measurement vector y, design matrix X, parameters λ, µ. Output: Unbiased estimator bθu.
Open Source Code	Yes	In the interest of reproducibility, an R implementation of our algorithm is available at http://www.stanford.edu/~montanar/sslasso/.
Open Datasets	Yes	We test our method on synthetic data and a high-throughput genomic data set about riboﬂavin production rate, made publicly available by B uhlmann et al. (2014). As a real data example, we consider a high-throughput genomic data set concerning riboﬂavin (vitamin B2) production rate. This data set is made publicly available by B uhlmann et al. (2014)
Dataset Splits	No	The paper uses synthetic data which is generated, and a real genomic dataset (riboflavin example) with n=71 samples and p=4,088 covariates. However, it does not explicitly provide details about how these datasets were split into training, validation, or test sets for the experiments presented in the paper.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions an "R implementation of our algorithm", and refers to "R-package hdi" and "R package glmnet (Friedman et al., 2010)". However, it does not specify version numbers for R itself or for the mentioned R packages.
Experiment Setup	Yes	We use the regularization parameter λ = 4bσ p(2 log p)/n, where bσ is given by the scaled LASSO as per equation (31) with eλ = 10 p(2 log p)/n. Furthermore, parameter µ (cf. Equation 4) is set to µ = 2.5 p(log p)/n. This choice of µ is guided by Theorem 7 (b). Throughout, we set the signiﬁcance level α = 0.05.