Error Estimation for Randomized Least-Squares Algorithms via the Bootstrap
Authors: Miles Lopes, Shusen Wang, Michael Mahoney
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present experimental results in the contexts of CS and IHS. At a high level, there are two main takeaways: (1) The extrapolation rules accurately predict how estimation error depends on m or t, and this is shown in a range of conditions. (2) In all of the experiments, the algorithms are implemented with only B = 20 bootstrap samples. The fact that favorable results can be obtained with so few samples underscores the point that the method incurs only modest cost in exchange for an accuracy guarantee. |
| Researcher Affiliation | Academia | 1Department of Statistics, UC Davis 2ICSI and Department of Statistics, UC Berkeley. |
| Pseudocode | Yes | Algorithm 1. (Error estimate for CS) [...] Algorithm 2. (Error estimate for IHS) |
| Open Source Code | No | The paper does not provide an explicit statement about making its source code available or a link to a code repository. |
| Open Datasets | Yes | Our numerical results are based on four linear regression datasets; two natural, and two synthetic. The natural datasets Year Prediction MSD , n = 463,715, d = 90, abbrev. MSD), and cpusmall (n = 8, 192, d = 12, abbrev. CPU) are available at the LIBSVM repository (Chang & Lin, 2011). URL http://www.csie.ntu.edu.tw/ cjlin/ libsvmtools/datasets/. |
| Dataset Splits | No | The paper describes its experimental setup in terms of generating realizations or runs for evaluation, but it does not specify explicit training, validation, or test dataset splits (e.g., in percentages or counts) for model training or selection, as is common in machine learning contexts. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions the LIBSVM repository as the source for datasets, which implies the use of related software, but it does not specify versions for any software components (libraries, frameworks, operating systems) used to run their experiments. |
| Experiment Setup | Yes | For each value of m in the grid {5d, . . . , 30d}, we generated 1,000 independent SRHT sketching matrices S Rm n, leading to 1,000 realizations of of ( A, b, x). Then, we computed the .95 sample quantile among the 1,000 values of x xopt at each grid point. [...] using an initial sketch size of m0 = 5d, we applied Algorithm 1 to each of the 1,000 realizations of A Rm0 d and b Rm0 computed previously, leading to 1,000 realizations of the initial error estimate εinit(.05). [...] the IHS algorithm was run 1,000 times, with t = 10 total iterations on each run, and with SRHT sketching matrices being used at each iteration. |