Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Can We Trust the Bootstrap in High-dimensions? The Case of Linear Models

Authors: Noureddine El Karoui, Elizabeth Purdom

JMLR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show through a mix of numerical and theoretical work that the bootstrap is fraught with problems. We rely on simulation results to demonstrate the practical impact of the failure of the bootstrap. The settings for our simulations and corresponding theoretical analyses are idealized, without many of the common settings of heteroskedasticity, dependency, outliers and so forth that are known to be a problem for bootstrapping. This is intentional, since even these idealized settings are suﬃcient to demonstrate that the standard bootstrap methods have poor performance.
Researcher Affiliation	Collaboration	Noureddine El Karoui EMAIL, EMAIL Criteo AI Lab 32 Rue Blanche 75009 Paris, France and Department of Statistics University of California Berkeley, CA 94270, USA Elizabeth Purdom EMAIL Department of Statistics University of California Berkeley, CA 94270, USA
Pseudocode	No	The paper describes methods and procedures using mathematical notation and prose, but it does not contain any explicitly labeled "Pseudocode" or "Algorithm" blocks, nor structured code-like steps.
Open Source Code	No	The paper discusses the use of third-party software packages such as 'decon package in R', 'Rmosek package', and 'quantreg function' for its analysis. However, it does not provide any explicit statement or link indicating that the authors' own code or implementation for the methodology described in the paper is publicly available.
Open Datasets	No	The paper explicitly states that the data used for the experiments were simulated: 'Simulation of data matrix X, {ϵi}n i=1 and construction of data yi = X β+ϵi. However, for our simulations, β = 0 (without loss of generality for the results, which are shift equivariant), so yi = ϵi.' While a Criteo publicly available dataset is mentioned in the references, the paper does not state that this dataset was used in the experiments described.
Dataset Splits	No	The paper's experiments are based on simulated data, as detailed in Appendix D.1: 'Simulation of data matrix X, {ϵi}n i=1 and construction of data yi = X β+ϵi.' Since the data is generated for each simulation run, the concept of fixed training, testing, or validation splits of a pre-existing dataset is not applicable.
Hardware Specification	No	The paper describes its numerical work and simulations, stating, for example, that 'The values for Figure 3 were generated with Matlab, using cvx and Mosek'. However, it does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run these experiments.
Software Dependencies	No	The paper mentions several software packages used for simulations, such as 'lm command in R', 'rlm command in the MASS package', 'MOSEK optimization package', 'Rmosek package', 'rq function that is part of the R package quantreg', 'decon package in R', 'boot package', and 'Matlab, using cvx and Mosek'. However, it does not provide specific version numbers for any of these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	The paper provides specific experimental setup details within Appendix D.1 'Simulation Description'. For example, it states: 'For L2 this was via the lm command in R, for Huber via the rlm command in the MASS package with default settings (k = 1.345)', 'Each bootstrap resampling consisted of R = 1,000 bootstrap samples', and specifies sample sizes ('n = 100, 500, and 1,000') and p/n ratios ('κ was simulated at 0.01, 0.1, 0.3, 0.5').