Feature-distributed sparse regression: a screen-and-clean approach

Authors: Jiyan Yang, Michael W. Mahoney, Michael Saunders, Yuekai Sun

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we provide empirical evaluations of our main algorithm SCREENANDCLEAN on synthetic datasets. Figure 1: Plots of the statistical error log f X( bβ β ) 2 2 versus iteration.
Researcher Affiliation Academia Jiyan Yang Michael W. Mahoney Michael A. Saunders Yuekai Sun Stanford University University of California at Berkeley University of Michigan
Pseudocode Yes Algorithm 1 Cleaning Stage
Open Source Code No The paper mentions 'implement it using Spark1' and provides a link to 'http://spark.apache.org/' which is a third-party tool. It does not provide concrete access to their own source code implementation.
Open Datasets No The paper states 'We generate a random instance of a sparse regression problem with size 1000 by 10000' and 'The synthetic datasets used in our experiments are based on model (1). In it, X N(0, ID) or X N(0, Σ) with all predictors equally correlated with correlation 0.7, ϵ N(0, 1).' This describes how data was generated, not providing access to a publicly available dataset.
Dataset Splits No The paper states 'For each N, 20 synthetic datasets are generated and the plots are made by averaging the results' but does not specify any train/validation/test splits, percentages, or explicit methodology for data partitioning.
Hardware Specification Yes All the experiments are implemented in Matlab on a shared memory machine with 512 GB RAM with 4(6) core intel Xeon E7540 2 GHz processors.
Software Dependencies No The paper mentions using 'TFOCS' and 'Spark' but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes Herein, in our experiments, the regularization parameter is set to be λ = 2 XT ϵ . Also, for SIS and SC, the screening size is set to be 2N. For SC, we run it with sketch size n = 2s log(N) where s = 5 and 3 iterations. For DECO, the dataset is partitioned into m = 3 subsets and it is implemented without the refinement step. The screening size is 2400, sketch size is 700, number of iterations is 3.