Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
A Statistical Perspective on Algorithmic Leveraging
Authors: Ping Ma, Michael Mahoney, Bin Yu
ICML 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our main empirical contribution is to provide a detailed evaluation of the statistical properties of these algorithmic leveraging estimators on both synthetic and real data sets. These empirical results indicate that our theory is a good predictor of practical performance for both existing algorithms and our two new leveraging algorithms as well as that our two new algorithms lead to improved performance. |
| Researcher Affiliation | Academia | Ping Ma EMAIL Department of Statistics, University of Georgia, Athens, GA 30602 Michael W. Mahoney EMAIL International Computer Science Institute and Dept. of Statistics, University of California at Berkeley, Berkeley, CA 94720 Bin Yu EMAIL Departments of Statistics and EECS, University of California at Berkeley, Berkeley, CA 94720 |
| Pseudocode | No | The paper describes algorithms (e.g., Subsample LS, SLEV, LEVUNW) in prose, detailing their steps, but does not provide formal pseudocode blocks or algorithms labeled as such. |
| Open Source Code | No | The paper does not include any statements or links regarding the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper uses 'synthetic data' generated based on specified distributions and parameters, but it does not refer to or provide access information for any publicly available or open datasets. |
| Dataset Splits | No | The paper describes generating synthetic data for 1000 runs and the parameters of the data generation (e.g., n=1000 and p=50), but it does not specify explicit training, validation, or test dataset splits in the conventional sense of partitioning a fixed dataset. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers. |
| Experiment Setup | Yes | We consider synthetic data of 1000 runs generated from y = Xβ+ϵ, where ϵ N(0, 9In), where several different values of n and p, leading to both very rectangular and moderately rectangular matrices X, are considered. The design matrix X is generated from one of three different classes of distributions introduced below. ... SLEV ... where α (0, 1). ... for n = 1000 and p = 50. |