reproducibilityindex.ai

Simultaneous Inference for Massive Data: Distributed Bootstrap

Authors: Yang Yu, Shih-Kang Chao, Guang Cheng

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulations validate our theory.Section 4 presents simulation results that corroborate our theoretical findings.
Researcher Affiliation	Academia	1Department of Statistics, Purdue University, USA 2Department of Statistics, University of Missouri, USA.
Pseudocode	Yes	Algorithm 1 Dist Boots(method, e , {gj}k j=1, e )
Open Source Code	No	The paper does not contain any explicit statement about making the source code available or provide a link to a code repository.
Open Datasets	No	For linear model, we generate e independently from N(0, 1), simulate the response from y = x> + e; for GLM, we consider logistic regression and obtain each response from y Ber(1/(1 + exp[ x> ])). This indicates the data was simulated, not from a public dataset.
Dataset Splits	No	The paper describes generating synthetic data for simulations and drawing bootstrap samples, but does not provide specific train/validation/test dataset splits in terms of percentages or sample counts.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup	Yes	Fix the total sample size N = 2^16. Choose d from {2^1, 2^3, 2^5, 2^7} and k from {2^0, 2^1, . . . , 2^11}. beta is determined by drawing uniformly from [ 0.5, 0.5]^d and keep it fixed for all replications. ...At each replication, we draw B = 500 bootstrap samples, from which we calculate the 95% empirical quantile to further obtain the 95% simultaneous confidence interval...