Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Estimating the Lasso's Effective Noise

Authors: Johannes Lederer, Michael Vogt

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical analysis is complemented by a simulation study in Section 5, which investigates the ﬁnite-sample performance of our methods.
Researcher Affiliation	Academia	Johannes Lederer EMAIL Department of Mathematics Ruhr-University Bochum 44801 Bochum, Germany Michael Vogt EMAIL Institute of Statistics Department of Mathematics and Economics Ulm University 89081 Ulm, Germany
Pseudocode	Yes	In practice, ˆλα can be computed by the following algorithm: Step 1: For some large natural number M, specify a grid of points 0 < λ1 < . . . < λM = λ, where λ = 2 X Y /n is the smallest tuning parameter λ for which ˆβλ equals zero. Simulate L samples e(1), . . . , e(L) of the standard normal random vector e. Step 2: For each grid point 1 m M, compute the values of the criterion function { ˆQ(λm, e(ℓ)) : 1 ℓ L} and calculate the empirical (1 α)-quantile ˆqα,emp(λm) from them. Step 3: Approximate ˆλα by ˆλα,emp := ˆqα,emp(λ ˆm), where ˆm = min{m : ˆqα,emp(λm ) λm for all m m} if ˆqα,emp(λM) λM and ˆm = M otherwise.
Open Source Code	No	The paper discusses the use of 'glmnet' a third-party tool, but does not provide any statement or link for the authors' own implementation code. For example: 'The lasso paths are computed through glmnet (Friedman et al., 2010) version 2.2.1'
Open Datasets	No	In Section 5, the authors state: 'We simulate data from the linear regression model (6) with sample size n = 500 and dimension p {250, 500, 1000}.' This indicates the use of simulated data, not a publicly available dataset.
Dataset Splits	No	The paper describes generating synthetic data for simulations but does not discuss splitting an existing dataset into training, test, or validation sets.
Hardware Specification	No	The paper mentions running 'All Monte Carlo experiments' but does not provide any specific hardware details such as CPU, GPU, or memory specifications.
Software Dependencies	Yes	The lasso paths are computed through glmnet (Friedman et al., 2010) version 2.2.1... The implementations are in R version 3.5.1.
Experiment Setup	Yes	We simulate data from the linear regression model (6) with sample size n = 500 and dimension p {250, 500, 1000}. The covariate vectors Xi = (Xi1, . . . , Xip) are independently sampled from a p-dimensional normal distribution with mean 0 and covariance matrix (1 κ)I +κE...We show the simulation results for κ = 0.25 unless indicated diﬀerently...The noise variables εi are drawn i.i.d. from a normal distribution with mean 0 and variance σ2 = 1. The target vector β has the form β = (c, . . . , c, 0, . . . , 0) ...We set SNR = 1 except when we analyze the hypothesis tests from Section 4.2... We implement our estimation method with L = 100 bootstrap replicates... The lasso paths are computed through glmnet (Friedman et al., 2010) version 2.2.1 with an equidistant grid of λ-values and M = 100... All Monte Carlo experiments are based on N = 1000 simulation runs.