A Residual Bootstrap for High-Dimensional Regression with Near Low-Rank Designs

Authors: Miles Lopes

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In four different settings of n, p, and the decay parameter η, we compared the nominal 90% confidence intervals (CIs) of four methods: oracle , ridge , normal , and OLS , to be described below. In each setting, we generated N1 := 100 random designs X with i.i.d. rows drawn from N(0, Σ), where λj(Σ) = j η, j = 1, . . . , p, and the eigenvectors of Σ were drawn randomly by setting them to be the Q factor in a QR decomposition of a standard p p Gaussian matrix. Then, for each realization of X, we generated N2 := 1000 realizations of Y according to the model (1), where β = 1/ 1 2 Rp, and F0 is the centered t distribution on 5 degrees of freedom, rescaled to have standard deviation σ = 0.1. and Table 1: Comparison of nominal 90% confidence intervals
Researcher Affiliation Academia Miles E. Lopes Department of Statistics University of California, Berkeley Berkeley, CA 94720 mlopes@stat.berkeley.edu
Pseudocode Yes Resampling algorithm. To summarize the discussion above, if B is user-specified number of bootstrap replicates, our proposed method for approximating Ψρ(F0; c) is given below. 1. Select ρ and ϱ, and compute the residuals be(ϱ) = Y X bβϱ. 2. Compute the centered distribution function b Fϱ, putting mass 1/n at each bei(ϱ) e(ϱ). 3. For j = 1, . . . , B: Draw a vector ε Rn of n i.i.d. samples from b Fϱ. Compute zj := c (X X + ρIp p) 1X ε . 4. Return the empirical distribution of z1, . . . , z B.
Open Source Code No The paper does not provide any statements or links indicating the availability of open-source code for the described methodology.
Open Datasets No The paper describes a data generation process for simulations ('we generated N1 := 100 random designs X with i.i.d. rows drawn from N(0, Σ)...'), but does not use a publicly available dataset, nor does it provide access information for the generated data.
Dataset Splits Yes To choose the parameters ρ and ϱ for a given X and Y , we first computed br as the value that optimized the MSPE error of a ridge estimator bβr with respect to 5-fold cross validation; i.e. cross validation was performed for every distinct pair (X, Y ).
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes To choose the parameters ρ and ϱ for a given X and Y , we first computed br as the value that optimized the MSPE error of a ridge estimator bβr with respect to 5-fold cross validation; i.e. cross validation was performed for every distinct pair (X, Y ). We then put ϱ = 5br and ρ = 0.1br, as we found the prefactors 5 and 0.1 to work adequately across various settings. (Optimizing ϱ with respect to MSPE is motivated by Theorems 1, 2, and 3. Also, choosing ρ to be somewhat smaller than ϱ conforms with the constraints on θ and γ in Theorem 4.)