reproducibilityindex.ai

Error Estimation for Randomized Least-Squares Algorithms via the Bootstrap

Authors: Miles Lopes, Shusen Wang, Michael Mahoney

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present experimental results in the contexts of CS and IHS. At a high level, there are two main takeaways: (1) The extrapolation rules accurately predict how estimation error depends on m or t, and this is shown in a range of conditions. (2) In all of the experiments, the algorithms are implemented with only B = 20 bootstrap samples. The fact that favorable results can be obtained with so few samples underscores the point that the method incurs only modest cost in exchange for an accuracy guarantee.
Researcher Affiliation	Academia	1Department of Statistics, UC Davis 2ICSI and Department of Statistics, UC Berkeley.
Pseudocode	Yes	Algorithm 1. (Error estimate for CS) [...] Algorithm 2. (Error estimate for IHS)
Open Source Code	No	The paper does not provide an explicit statement about making its source code available or a link to a code repository.
Open Datasets	Yes	Our numerical results are based on four linear regression datasets; two natural, and two synthetic. The natural datasets Year Prediction MSD , n = 463,715, d = 90, abbrev. MSD), and cpusmall (n = 8, 192, d = 12, abbrev. CPU) are available at the LIBSVM repository (Chang & Lin, 2011). URL http://www.csie.ntu.edu.tw/ cjlin/ libsvmtools/datasets/.
Dataset Splits	No	The paper describes its experimental setup in terms of generating realizations or runs for evaluation, but it does not specify explicit training, validation, or test dataset splits (e.g., in percentages or counts) for model training or selection, as is common in machine learning contexts.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions the LIBSVM repository as the source for datasets, which implies the use of related software, but it does not specify versions for any software components (libraries, frameworks, operating systems) used to run their experiments.
Experiment Setup	Yes	For each value of m in the grid {5d, . . . , 30d}, we generated 1,000 independent SRHT sketching matrices S Rm n, leading to 1,000 realizations of of ( A, b, x). Then, we computed the .95 sample quantile among the 1,000 values of x xopt at each grid point. [...] using an initial sketch size of m0 = 5d, we applied Algorithm 1 to each of the 1,000 realizations of A Rm0 d and b Rm0 computed previously, leading to 1,000 realizations of the initial error estimate εinit(.05). [...] the IHS algorithm was run 1,000 times, with t = 10 total iterations on each run, and with SRHT sketching matrices being used at each iteration.