Structured Sparse Regression via Greedy Hard Thresholding
Authors: Prateek Jain, Nikhil Rao, Inderjit S. Dhillon
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both real and synthetic data validate our claims and demonstrate that the proposed methods are orders of magnitude faster than other greedy and convex relaxation techniques for learning with group-structured sparsity. In this section, we empirically compare and contrast our proposed group IHT methods against the existing approaches to solve the overlapping group sparsity problem. |
| Researcher Affiliation | Collaboration | Prateek Jain Microsoft Research India Nikhil Rao Technicolor Inderjit Dhillon |
| Pseudocode | Yes | Algorithm 1 IHT for Group-sparsity, Algorithm 2 Greedy Projection, Algorithm 3 Greedy Projections for So G |
| Open Source Code | No | The paper does not provide any statement about releasing source code for the methodology described, nor does it include any links to a code repository. |
| Open Datasets | Yes | Tumor Classification, Breast Cancer Dataset We next compare the aforementioned methods on a gene selection problem for breast cancer tumor classification. We use the data used in [8] 2. We ran a 5-fold cross validation scheme to choose parameters, where we varied 2 {2 5, 2 4, . . . , 23} k 2 {2, 5, 10, 15, 20, 50, 100} 2 {23, 24, . . . , 213}. Figure 2 (Right) shows that the vanilla hard thresholding method is competitive despite performing approximate projections, and the method with full corrections obtains the best performance among the methods considered. We randomly chose 15% of the data to test on. The footnote '2download at http : //cbio.ensmp.fr/ ljacob/' provides a direct link to the dataset. |
| Dataset Splits | Yes | We ran a 5-fold cross validation scheme to choose parameters, where we varied 2 {2 5, 2 4, . . . , 23} k 2 {2, 5, 10, 15, 20, 50, 100} 2 {23, 24, . . . , 213}. We randomly chose 15% of the data to test on. |
| Hardware Specification | Yes | All relevant hyper-parameters were chosen via a grid search, and experiments were run on a macbook laptop with a 2.5 GHz processor and 16gb memory. |
| Software Dependencies | No | The paper mentions 'Cholesky decompositions via the backslash operator in MATLAB' but does not specify version numbers for MATLAB or any other software components. |
| Experiment Setup | Yes | All relevant hyper-parameters were chosen via a grid search, and experiments were run on a macbook laptop with a 2.5 GHz processor and 16gb memory. For the Breast Cancer Dataset: 'where we varied 2 {2 5, 2 4, . . . , 23} k 2 {2, 5, 10, 15, 20, 50, 100} 2 {23, 24, . . . , 213}'. For Synthetic Data: 'We generated M = 1000 groups of contiguous indices of size 25; the last 5 entries of one group overlap with the first 5 of the next. We randomly set 50 of these to be active, populated by uniform [ 1, 1] entries. This yields w? 2 Rp, p 22000. X 2 Rn p where n = 5000 and Xij i.i.d N(0, 1). Each measurement is corrupted with Additive White Gaussian Noise (AWGN) with standard deviation λ = 0.1.' |