Bootstrap in High Dimension with Low Computation
Authors: Henry Lam, Zhenyuan Liu
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our theoretical results and compare the performance of our approach with other benchmarks via a range of experiments. In addition to theoretical bounds, we investigate the empirical performances of bootstraps using few resamples on largescale problems, including high-dimensional linear regression, high-dimensional logistic regression, computational simulation modeling, and a real-world data set RCV1-v2 (Lewis et al., 2004). |
| Researcher Affiliation | Academia | Henry Lam 1 Zhenyuan Liu 1 1Department of Industrial Engineering and Operations Research, Columbia University, New York, NY, USA. |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found in the paper. Methods are described using prose and mathematical formulas. |
| Open Source Code | No | No explicit statement about the release of their open-source code or a link to a code repository was found in the paper. |
| Open Datasets | Yes | A real-world data example: We run logistic regression on the RCV1-v2 data in Lewis et al. (2004). |
| Dataset Splits | No | The paper describes generating new datasets from ground truth distributions for each repetition and uses a real-world dataset (RCV1-v2) but does not specify any train/validation/test splits. |
| Hardware Specification | Yes | In this example, one Monte Carlo replication to obtain each resample estimate takes around 4 minutes in the virtual machine e2-highmem-2 in Google Cloud Platform. Therefore, the cheap bootstrap only requires 4 minutes to obtain a statistically valid interval, but the standard bootstrap methods are still far from the nominal coverage even after more than a 40-minute run. Some examples have larger scale and thus are run in the virtual machine e2-highmem-8 with larger memory and better CPU, whose running time will be starred (*). |
| Software Dependencies | No | The paper mentions using “sklearn.linear_model.LogisticRegression (a machine learning package in Python)” but does not specify the version numbers for Python or the scikit-learn library. |
| Experiment Setup | Yes | In each example above, our targets are 95%-level confidence intervals for the target parameters. We vary the number of resamples B from 1 to 10 in all examples and report the running time... For each setup except the real-data example, we run 1000 experimental repetitions, each time generating a new data set from the ground truth distribution and construct the intervals. (Specific model configurations like: “The first, second and last 1/3 components of β = (βi)p i=1 are 0, 2, 1 respectively.” for linear regression.) |