Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Estimating the Lasso's Effective Noise

Authors: Johannes Lederer, Michael Vogt

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theoretical analysis is complemented by a simulation study in Section 5, which investigates the finite-sample performance of our methods.
Researcher Affiliation Academia Johannes Lederer EMAIL Department of Mathematics Ruhr-University Bochum 44801 Bochum, Germany Michael Vogt EMAIL Institute of Statistics Department of Mathematics and Economics Ulm University 89081 Ulm, Germany
Pseudocode Yes In practice, ˆλα can be computed by the following algorithm: Step 1: For some large natural number M, specify a grid of points 0 < λ1 < . . . < λM = λ, where λ = 2 X Y /n is the smallest tuning parameter λ for which ˆβλ equals zero. Simulate L samples e(1), . . . , e(L) of the standard normal random vector e. Step 2: For each grid point 1 m M, compute the values of the criterion function { ˆQ(λm, e(ℓ)) : 1 ℓ L} and calculate the empirical (1 α)-quantile ˆqα,emp(λm) from them. Step 3: Approximate ˆλα by ˆλα,emp := ˆqα,emp(λ ˆm), where ˆm = min{m : ˆqα,emp(λm ) λm for all m m} if ˆqα,emp(λM) λM and ˆm = M otherwise.
Open Source Code No The paper discusses the use of 'glmnet' a third-party tool, but does not provide any statement or link for the authors' own implementation code. For example: 'The lasso paths are computed through glmnet (Friedman et al., 2010) version 2.2.1'
Open Datasets No In Section 5, the authors state: 'We simulate data from the linear regression model (6) with sample size n = 500 and dimension p {250, 500, 1000}.' This indicates the use of simulated data, not a publicly available dataset.
Dataset Splits No The paper describes generating synthetic data for simulations but does not discuss splitting an existing dataset into training, test, or validation sets.
Hardware Specification No The paper mentions running 'All Monte Carlo experiments' but does not provide any specific hardware details such as CPU, GPU, or memory specifications.
Software Dependencies Yes The lasso paths are computed through glmnet (Friedman et al., 2010) version 2.2.1... The implementations are in R version 3.5.1.
Experiment Setup Yes We simulate data from the linear regression model (6) with sample size n = 500 and dimension p {250, 500, 1000}. The covariate vectors Xi = (Xi1, . . . , Xip) are independently sampled from a p-dimensional normal distribution with mean 0 and covariance matrix (1 κ)I +κE...We show the simulation results for κ = 0.25 unless indicated differently...The noise variables εi are drawn i.i.d. from a normal distribution with mean 0 and variance σ2 = 1. The target vector β has the form β = (c, . . . , c, 0, . . . , 0) ...We set SNR = 1 except when we analyze the hypothesis tests from Section 4.2... We implement our estimation method with L = 100 bootstrap replicates... The lasso paths are computed through glmnet (Friedman et al., 2010) version 2.2.1 with an equidistant grid of λ-values and M = 100... All Monte Carlo experiments are based on N = 1000 simulation runs.