reproducibilityindex.ai

Feature-distributed sparse regression: a screen-and-clean approach

Authors: Jiyan Yang, Michael W. Mahoney, Michael Saunders, Yuekai Sun

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we provide empirical evaluations of our main algorithm SCREENANDCLEAN on synthetic datasets. Figure 1: Plots of the statistical error log f X( bβ β ) 2 2 versus iteration.
Researcher Affiliation	Academia	Jiyan Yang Michael W. Mahoney Michael A. Saunders Yuekai Sun Stanford University University of California at Berkeley University of Michigan
Pseudocode	Yes	Algorithm 1 Cleaning Stage
Open Source Code	No	The paper mentions 'implement it using Spark1' and provides a link to 'http://spark.apache.org/' which is a third-party tool. It does not provide concrete access to their own source code implementation.
Open Datasets	No	The paper states 'We generate a random instance of a sparse regression problem with size 1000 by 10000' and 'The synthetic datasets used in our experiments are based on model (1). In it, X N(0, ID) or X N(0, Σ) with all predictors equally correlated with correlation 0.7, ϵ N(0, 1).' This describes how data was generated, not providing access to a publicly available dataset.
Dataset Splits	No	The paper states 'For each N, 20 synthetic datasets are generated and the plots are made by averaging the results' but does not specify any train/validation/test splits, percentages, or explicit methodology for data partitioning.
Hardware Specification	Yes	All the experiments are implemented in Matlab on a shared memory machine with 512 GB RAM with 4(6) core intel Xeon E7540 2 GHz processors.
Software Dependencies	No	The paper mentions using 'TFOCS' and 'Spark' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	Herein, in our experiments, the regularization parameter is set to be λ = 2 XT ϵ . Also, for SIS and SC, the screening size is set to be 2N. For SC, we run it with sketch size n = 2s log(N) where s = 5 and 3 iterations. For DECO, the dataset is partitioned into m = 3 subsets and it is implemented without the reﬁnement step. The screening size is 2400, sketch size is 700, number of iterations is 3.