Feature-distributed sparse regression: a screen-and-clean approach
Authors: Jiyan Yang, Michael W. Mahoney, Michael Saunders, Yuekai Sun
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we provide empirical evaluations of our main algorithm SCREENANDCLEAN on synthetic datasets. Figure 1: Plots of the statistical error log f X( bβ β ) 2 2 versus iteration. |
| Researcher Affiliation | Academia | Jiyan Yang Michael W. Mahoney Michael A. Saunders Yuekai Sun Stanford University University of California at Berkeley University of Michigan |
| Pseudocode | Yes | Algorithm 1 Cleaning Stage |
| Open Source Code | No | The paper mentions 'implement it using Spark1' and provides a link to 'http://spark.apache.org/' which is a third-party tool. It does not provide concrete access to their own source code implementation. |
| Open Datasets | No | The paper states 'We generate a random instance of a sparse regression problem with size 1000 by 10000' and 'The synthetic datasets used in our experiments are based on model (1). In it, X N(0, ID) or X N(0, Σ) with all predictors equally correlated with correlation 0.7, ϵ N(0, 1).' This describes how data was generated, not providing access to a publicly available dataset. |
| Dataset Splits | No | The paper states 'For each N, 20 synthetic datasets are generated and the plots are made by averaging the results' but does not specify any train/validation/test splits, percentages, or explicit methodology for data partitioning. |
| Hardware Specification | Yes | All the experiments are implemented in Matlab on a shared memory machine with 512 GB RAM with 4(6) core intel Xeon E7540 2 GHz processors. |
| Software Dependencies | No | The paper mentions using 'TFOCS' and 'Spark' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Herein, in our experiments, the regularization parameter is set to be λ = 2 XT ϵ . Also, for SIS and SC, the screening size is set to be 2N. For SC, we run it with sketch size n = 2s log(N) where s = 5 and 3 iterations. For DECO, the dataset is partitioned into m = 3 subsets and it is implemented without the refinement step. The screening size is 2400, sketch size is 700, number of iterations is 3. |