Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Estimating the Lasso's Effective Noise
Authors: Johannes Lederer, Michael Vogt
JMLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theoretical analysis is complemented by a simulation study in Section 5, which investigates the finite-sample performance of our methods. |
| Researcher Affiliation | Academia | Johannes Lederer EMAIL Department of Mathematics Ruhr-University Bochum 44801 Bochum, Germany Michael Vogt EMAIL Institute of Statistics Department of Mathematics and Economics Ulm University 89081 Ulm, Germany |
| Pseudocode | Yes | In practice, ˆλα can be computed by the following algorithm: Step 1: For some large natural number M, specify a grid of points 0 < λ1 < . . . < λM = λ, where λ = 2 X Y /n is the smallest tuning parameter λ for which ˆβλ equals zero. Simulate L samples e(1), . . . , e(L) of the standard normal random vector e. Step 2: For each grid point 1 m M, compute the values of the criterion function { ˆQ(λm, e(ℓ)) : 1 ℓ L} and calculate the empirical (1 α)-quantile ˆqα,emp(λm) from them. Step 3: Approximate ˆλα by ˆλα,emp := ˆqα,emp(λ ˆm), where ˆm = min{m : ˆqα,emp(λm ) λm for all m m} if ˆqα,emp(λM) λM and ˆm = M otherwise. |
| Open Source Code | No | The paper discusses the use of 'glmnet' a third-party tool, but does not provide any statement or link for the authors' own implementation code. For example: 'The lasso paths are computed through glmnet (Friedman et al., 2010) version 2.2.1' |
| Open Datasets | No | In Section 5, the authors state: 'We simulate data from the linear regression model (6) with sample size n = 500 and dimension p {250, 500, 1000}.' This indicates the use of simulated data, not a publicly available dataset. |
| Dataset Splits | No | The paper describes generating synthetic data for simulations but does not discuss splitting an existing dataset into training, test, or validation sets. |
| Hardware Specification | No | The paper mentions running 'All Monte Carlo experiments' but does not provide any specific hardware details such as CPU, GPU, or memory specifications. |
| Software Dependencies | Yes | The lasso paths are computed through glmnet (Friedman et al., 2010) version 2.2.1... The implementations are in R version 3.5.1. |
| Experiment Setup | Yes | We simulate data from the linear regression model (6) with sample size n = 500 and dimension p {250, 500, 1000}. The covariate vectors Xi = (Xi1, . . . , Xip) are independently sampled from a p-dimensional normal distribution with mean 0 and covariance matrix (1 κ)I +κE...We show the simulation results for κ = 0.25 unless indicated differently...The noise variables εi are drawn i.i.d. from a normal distribution with mean 0 and variance σ2 = 1. The target vector β has the form β = (c, . . . , c, 0, . . . , 0) ...We set SNR = 1 except when we analyze the hypothesis tests from Section 4.2... We implement our estimation method with L = 100 bootstrap replicates... The lasso paths are computed through glmnet (Friedman et al., 2010) version 2.2.1 with an equidistant grid of λ-values and M = 100... All Monte Carlo experiments are based on N = 1000 simulation runs. |