A Statistical Perspective on Algorithmic Leveraging
Authors: Ping Ma, Michael Mahoney, Bin Yu
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our main empirical contribution is to provide a detailed evaluation of the statistical properties of these algorithmic leveraging estimators on both synthetic and real data sets. These empirical results indicate that our theory is a good predictor of practical performance for both existing algorithms and our two new leveraging algorithms as well as that our two new algorithms lead to improved performance. |
| Researcher Affiliation | Academia | Ping Ma PINGMA@UGA.EDU Department of Statistics, University of Georgia, Athens, GA 30602 Michael W. Mahoney MMAHONEY@ICSI.BERKELEY.EDU International Computer Science Institute and Dept. of Statistics, University of California at Berkeley, Berkeley, CA 94720 Bin Yu BINYU@STAT.BERKELEY.EDU Departments of Statistics and EECS, University of California at Berkeley, Berkeley, CA 94720 |
| Pseudocode | No | The paper describes algorithms (e.g., Subsample LS, SLEV, LEVUNW) in prose, detailing their steps, but does not provide formal pseudocode blocks or algorithms labeled as such. |
| Open Source Code | No | The paper does not include any statements or links regarding the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper uses 'synthetic data' generated based on specified distributions and parameters, but it does not refer to or provide access information for any publicly available or open datasets. |
| Dataset Splits | No | The paper describes generating synthetic data for 1000 runs and the parameters of the data generation (e.g., n=1000 and p=50), but it does not specify explicit training, validation, or test dataset splits in the conventional sense of partitioning a fixed dataset. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers. |
| Experiment Setup | Yes | We consider synthetic data of 1000 runs generated from y = Xβ+ϵ, where ϵ N(0, 9In), where several different values of n and p, leading to both very rectangular and moderately rectangular matrices X, are considered. The design matrix X is generated from one of three different classes of distributions introduced below. ... SLEV ... where α (0, 1). ... for n = 1000 and p = 50. |