Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Bootstrap in High Dimension with Low Computation
Authors: Henry Lam, Zhenyuan Liu
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our theoretical results and compare the performance of our approach with other benchmarks via a range of experiments. In addition to theoretical bounds, we investigate the empirical performances of bootstraps using few resamples on largescale problems, including high-dimensional linear regression, high-dimensional logistic regression, computational simulation modeling, and a real-world data set RCV1-v2 (Lewis et al., 2004). |
| Researcher Affiliation | Academia | Henry Lam 1 Zhenyuan Liu 1 1Department of Industrial Engineering and Operations Research, Columbia University, New York, NY, USA. |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found in the paper. Methods are described using prose and mathematical formulas. |
| Open Source Code | No | No explicit statement about the release of their open-source code or a link to a code repository was found in the paper. |
| Open Datasets | Yes | A real-world data example: We run logistic regression on the RCV1-v2 data in Lewis et al. (2004). |
| Dataset Splits | No | The paper describes generating new datasets from ground truth distributions for each repetition and uses a real-world dataset (RCV1-v2) but does not specify any train/validation/test splits. |
| Hardware Specification | Yes | In this example, one Monte Carlo replication to obtain each resample estimate takes around 4 minutes in the virtual machine e2-highmem-2 in Google Cloud Platform. Therefore, the cheap bootstrap only requires 4 minutes to obtain a statistically valid interval, but the standard bootstrap methods are still far from the nominal coverage even after more than a 40-minute run. Some examples have larger scale and thus are run in the virtual machine e2-highmem-8 with larger memory and better CPU, whose running time will be starred (*). |
| Software Dependencies | No | The paper mentions using “sklearn.linear_model.LogisticRegression (a machine learning package in Python)” but does not specify the version numbers for Python or the scikit-learn library. |
| Experiment Setup | Yes | In each example above, our targets are 95%-level confidence intervals for the target parameters. We vary the number of resamples B from 1 to 10 in all examples and report the running time... For each setup except the real-data example, we run 1000 experimental repetitions, each time generating a new data set from the ground truth distribution and construct the intervals. (Specific model configurations like: “The first, second and last 1/3 components of β = (βi)p i=1 are 0, 2, 1 respectively.” for linear regression.) |