Bootstrap in High Dimension with Low Computation

Authors: Henry Lam, Zhenyuan Liu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our theoretical results and compare the performance of our approach with other benchmarks via a range of experiments. In addition to theoretical bounds, we investigate the empirical performances of bootstraps using few resamples on largescale problems, including high-dimensional linear regression, high-dimensional logistic regression, computational simulation modeling, and a real-world data set RCV1-v2 (Lewis et al., 2004).
Researcher Affiliation Academia Henry Lam 1 Zhenyuan Liu 1 1Department of Industrial Engineering and Operations Research, Columbia University, New York, NY, USA.
Pseudocode No No explicit pseudocode or algorithm blocks were found in the paper. Methods are described using prose and mathematical formulas.
Open Source Code No No explicit statement about the release of their open-source code or a link to a code repository was found in the paper.
Open Datasets Yes A real-world data example: We run logistic regression on the RCV1-v2 data in Lewis et al. (2004).
Dataset Splits No The paper describes generating new datasets from ground truth distributions for each repetition and uses a real-world dataset (RCV1-v2) but does not specify any train/validation/test splits.
Hardware Specification Yes In this example, one Monte Carlo replication to obtain each resample estimate takes around 4 minutes in the virtual machine e2-highmem-2 in Google Cloud Platform. Therefore, the cheap bootstrap only requires 4 minutes to obtain a statistically valid interval, but the standard bootstrap methods are still far from the nominal coverage even after more than a 40-minute run. Some examples have larger scale and thus are run in the virtual machine e2-highmem-8 with larger memory and better CPU, whose running time will be starred (*).
Software Dependencies No The paper mentions using “sklearn.linear_model.LogisticRegression (a machine learning package in Python)” but does not specify the version numbers for Python or the scikit-learn library.
Experiment Setup Yes In each example above, our targets are 95%-level confidence intervals for the target parameters. We vary the number of resamples B from 1 to 10 in all examples and report the running time... For each setup except the real-data example, we run 1000 experimental repetitions, each time generating a new data set from the ground truth distribution and construct the intervals. (Specific model configurations like: “The first, second and last 1/3 components of β = (βi)p i=1 are 0, 2, 1 respectively.” for linear regression.)