Elementary Estimators for High-Dimensional Linear Regression

Authors: Eunho Yang, Aurelie Lozano, Pradeep Ravikumar

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We analyze our estimators in the high-dimensional setting, and moreover provide empirical corroboration of its performance on simulated as well as real world microarray data. We demonstrate the performance of our elementary estimators on simulated as well as real-world datasets.
Researcher Affiliation Collaboration Eunho Yang EUNHO@CS.UTEXAS.EDU Department of Computer Science, The University of Texas, Austin, TX 78712, USA Aur elie C. Lozano ACLOZANO@US.IBM.COM IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA Pradeep Ravikumar PRADEEPR@CS.UTEXAS.EDU Department of Computer Science, The University of Texas, Austin, TX 78712, USA
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release) for the source code of the methodology described.
Open Datasets Yes We used microarray data pertaining to isoprenoid biosynthesis in Arabidopsis thaliana (A. thaliana) provided by Wille et al. (2004).
Dataset Splits Yes Thus, as is standard with high-dimensional regularized convex programs, we set the tuning parameters in a holdout-validated fashion, as those that minimize the average squared error on an independent validation set of sample size n. The tuning parameters were selected using 5 fold cross-validation.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., exact GPU/CPU models, processor types, or memory amounts).
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions) needed to replicate the experiment.
Experiment Setup Yes We set the number of samples to n = 1000, and the number of covariates among p {1000, 2000}. For each simulation, the entries of the true model coefficient vector θ are set to be 0 everywhere, except for a randomly chosen subset of 10 coefficients, which are chosen independently and uniformly in the interval (1, 3). There are 131 samples. All variables are log transformed. We evaluate the predictive accuracy of the methods by randomly partitioning the data into training and test sets, using 90 observations for training and the remainder for testing. The tuning parameters were selected using 5 fold cross-validation.