reproducibilityindex.ai

A Residual Bootstrap for High-Dimensional Regression with Near Low-Rank Designs

Authors: Miles Lopes

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In four different settings of n, p, and the decay parameter η, we compared the nominal 90% conﬁdence intervals (CIs) of four methods: oracle , ridge , normal , and OLS , to be described below. In each setting, we generated N1 := 100 random designs X with i.i.d. rows drawn from N(0, Σ), where λj(Σ) = j η, j = 1, . . . , p, and the eigenvectors of Σ were drawn randomly by setting them to be the Q factor in a QR decomposition of a standard p p Gaussian matrix. Then, for each realization of X, we generated N2 := 1000 realizations of Y according to the model (1), where β = 1/ 1 2 Rp, and F0 is the centered t distribution on 5 degrees of freedom, rescaled to have standard deviation σ = 0.1. and Table 1: Comparison of nominal 90% conﬁdence intervals
Researcher Affiliation	Academia	Miles E. Lopes Department of Statistics University of California, Berkeley Berkeley, CA 94720 mlopes@stat.berkeley.edu
Pseudocode	Yes	Resampling algorithm. To summarize the discussion above, if B is user-speciﬁed number of bootstrap replicates, our proposed method for approximating Ψρ(F0; c) is given below. 1. Select ρ and ϱ, and compute the residuals be(ϱ) = Y X bβϱ. 2. Compute the centered distribution function b Fϱ, putting mass 1/n at each bei(ϱ) e(ϱ). 3. For j = 1, . . . , B: Draw a vector ε Rn of n i.i.d. samples from b Fϱ. Compute zj := c (X X + ρIp p) 1X ε . 4. Return the empirical distribution of z1, . . . , z B.
Open Source Code	No	The paper does not provide any statements or links indicating the availability of open-source code for the described methodology.
Open Datasets	No	The paper describes a data generation process for simulations ('we generated N1 := 100 random designs X with i.i.d. rows drawn from N(0, Σ)...'), but does not use a publicly available dataset, nor does it provide access information for the generated data.
Dataset Splits	Yes	To choose the parameters ρ and ϱ for a given X and Y , we ﬁrst computed br as the value that optimized the MSPE error of a ridge estimator bβr with respect to 5-fold cross validation; i.e. cross validation was performed for every distinct pair (X, Y ).
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers.
Experiment Setup	Yes	To choose the parameters ρ and ϱ for a given X and Y , we ﬁrst computed br as the value that optimized the MSPE error of a ridge estimator bβr with respect to 5-fold cross validation; i.e. cross validation was performed for every distinct pair (X, Y ). We then put ϱ = 5br and ρ = 0.1br, as we found the prefactors 5 and 0.1 to work adequately across various settings. (Optimizing ϱ with respect to MSPE is motivated by Theorems 1, 2, and 3. Also, choosing ρ to be somewhat smaller than ϱ conforms with the constraints on θ and γ in Theorem 4.)