reproducibilityindex.ai

A Statistical Perspective on Algorithmic Leveraging

Authors: Ping Ma, Michael Mahoney, Bin Yu

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our main empirical contribution is to provide a detailed evaluation of the statistical properties of these algorithmic leveraging estimators on both synthetic and real data sets. These empirical results indicate that our theory is a good predictor of practical performance for both existing algorithms and our two new leveraging algorithms as well as that our two new algorithms lead to improved performance.
Researcher Affiliation	Academia	Ping Ma PINGMA@UGA.EDU Department of Statistics, University of Georgia, Athens, GA 30602 Michael W. Mahoney MMAHONEY@ICSI.BERKELEY.EDU International Computer Science Institute and Dept. of Statistics, University of California at Berkeley, Berkeley, CA 94720 Bin Yu BINYU@STAT.BERKELEY.EDU Departments of Statistics and EECS, University of California at Berkeley, Berkeley, CA 94720
Pseudocode	No	The paper describes algorithms (e.g., Subsample LS, SLEV, LEVUNW) in prose, detailing their steps, but does not provide formal pseudocode blocks or algorithms labeled as such.
Open Source Code	No	The paper does not include any statements or links regarding the availability of open-source code for the described methodology.
Open Datasets	No	The paper uses 'synthetic data' generated based on specified distributions and parameters, but it does not refer to or provide access information for any publicly available or open datasets.
Dataset Splits	No	The paper describes generating synthetic data for 1000 runs and the parameters of the data generation (e.g., n=1000 and p=50), but it does not specify explicit training, validation, or test dataset splits in the conventional sense of partitioning a fixed dataset.
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper does not list any specific software dependencies with version numbers.
Experiment Setup	Yes	We consider synthetic data of 1000 runs generated from y = Xβ+ϵ, where ϵ N(0, 9In), where several different values of n and p, leading to both very rectangular and moderately rectangular matrices X, are considered. The design matrix X is generated from one of three different classes of distributions introduced below. ... SLEV ... where α (0, 1). ... for n = 1000 and p = 50.