Simultaneous Inference for Massive Data: Distributed Bootstrap

Authors: Yang Yu, Shih-Kang Chao, Guang Cheng

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulations validate our theory.Section 4 presents simulation results that corroborate our theoretical findings.
Researcher Affiliation Academia 1Department of Statistics, Purdue University, USA 2Department of Statistics, University of Missouri, USA.
Pseudocode Yes Algorithm 1 Dist Boots(method, e , {gj}k j=1, e )
Open Source Code No The paper does not contain any explicit statement about making the source code available or provide a link to a code repository.
Open Datasets No For linear model, we generate e independently from N(0, 1), simulate the response from y = x> + e; for GLM, we consider logistic regression and obtain each response from y Ber(1/(1 + exp[ x> ])). This indicates the data was simulated, not from a public dataset.
Dataset Splits No The paper describes generating synthetic data for simulations and drawing bootstrap samples, but does not provide specific train/validation/test dataset splits in terms of percentages or sample counts.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup Yes Fix the total sample size N = 2^16. Choose d from {2^1, 2^3, 2^5, 2^7} and k from {2^0, 2^1, . . . , 2^11}. beta is determined by drawing uniformly from [ 0.5, 0.5]^d and keep it fixed for all replications. ...At each replication, we draw B = 500 bootstrap samples, from which we calculate the 95% empirical quantile to further obtain the 95% simultaneous confidence interval...