reproducibilityindex.ai

Structured Sparse Regression via Greedy Hard Thresholding

Authors: Prateek Jain, Nikhil Rao, Inderjit S. Dhillon

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on both real and synthetic data validate our claims and demonstrate that the proposed methods are orders of magnitude faster than other greedy and convex relaxation techniques for learning with group-structured sparsity. In this section, we empirically compare and contrast our proposed group IHT methods against the existing approaches to solve the overlapping group sparsity problem.
Researcher Affiliation	Collaboration	Prateek Jain Microsoft Research India Nikhil Rao Technicolor Inderjit Dhillon
Pseudocode	Yes	Algorithm 1 IHT for Group-sparsity, Algorithm 2 Greedy Projection, Algorithm 3 Greedy Projections for So G
Open Source Code	No	The paper does not provide any statement about releasing source code for the methodology described, nor does it include any links to a code repository.
Open Datasets	Yes	Tumor Classiﬁcation, Breast Cancer Dataset We next compare the aforementioned methods on a gene selection problem for breast cancer tumor classiﬁcation. We use the data used in [8] 2. We ran a 5-fold cross validation scheme to choose parameters, where we varied 2 {2 5, 2 4, . . . , 23} k 2 {2, 5, 10, 15, 20, 50, 100} 2 {23, 24, . . . , 213}. Figure 2 (Right) shows that the vanilla hard thresholding method is competitive despite performing approximate projections, and the method with full corrections obtains the best performance among the methods considered. We randomly chose 15% of the data to test on. The footnote '2download at http : //cbio.ensmp.fr/ ljacob/' provides a direct link to the dataset.
Dataset Splits	Yes	We ran a 5-fold cross validation scheme to choose parameters, where we varied 2 {2 5, 2 4, . . . , 23} k 2 {2, 5, 10, 15, 20, 50, 100} 2 {23, 24, . . . , 213}. We randomly chose 15% of the data to test on.
Hardware Specification	Yes	All relevant hyper-parameters were chosen via a grid search, and experiments were run on a macbook laptop with a 2.5 GHz processor and 16gb memory.
Software Dependencies	No	The paper mentions 'Cholesky decompositions via the backslash operator in MATLAB' but does not specify version numbers for MATLAB or any other software components.
Experiment Setup	Yes	All relevant hyper-parameters were chosen via a grid search, and experiments were run on a macbook laptop with a 2.5 GHz processor and 16gb memory. For the Breast Cancer Dataset: 'where we varied 2 {2 5, 2 4, . . . , 23} k 2 {2, 5, 10, 15, 20, 50, 100} 2 {23, 24, . . . , 213}'. For Synthetic Data: 'We generated M = 1000 groups of contiguous indices of size 25; the last 5 entries of one group overlap with the ﬁrst 5 of the next. We randomly set 50 of these to be active, populated by uniform [ 1, 1] entries. This yields w? 2 Rp, p 22000. X 2 Rn p where n = 5000 and Xij i.i.d N(0, 1). Each measurement is corrupted with Additive White Gaussian Noise (AWGN) with standard deviation λ = 0.1.'