Sparse PCA from Sparse Linear Regression
Authors: Guy Bresler, Sung Min Park, Madalina Persu
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide experimental results on simulated data comparing our proposed framework to other algorithms for SPCA. We test our algorithmic framework on randomly generated synthetic data and compare to other existing algorithms for SPCA. |
| Researcher Affiliation | Collaboration | Guy Bresler MIT guy@mit.edu Sung Min Park MIT sp765@mit.edu M ad alina Persu Two Sigma , MIT mpersu@mit.edu |
| Pseudocode | Yes | Algorithm 1 Q-hypothesis testing Input: X 2 Rd n, k Output: {0, 1} for i = 1, . . . , d do bβi = SLR(Xi, X i, k) Qi = 1 nk Xi X i bβik2 2 if Qi > 13k log d k n then return 1 end if end for Return 0 Algorithm 2 Q-support recovery Input: X 2 Rd n, k b S = ? for i = 1, . . . , d do bβi = SLR(Xi, X i, k) Qi = 1 nk Xi X i bβik2 2 if Qi > 13k log d k n then b S b S [ {i} end if end for Return b S |
| Open Source Code | No | The paper does not provide an explicit link to open-source code or state that code is made available. |
| Open Datasets | No | We test our algorithmic framework on randomly generated synthetic data and compare to other existing algorithms for SPCA. The paper states that data is 'randomly generated synthetic data' and does not provide access information for a public dataset. |
| Dataset Splits | No | The paper mentions generating data for experiments but does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper mentions the use of 'standard libraries' and 'Python' for implementation, but it does not specify any hardware details like GPU/CPU models or memory. |
| Software Dependencies | No | The code was implemented in Python using standard libraries. The paper mentions Python and standard libraries but does not provide specific version numbers for any software components. |
| Experiment Setup | Yes | For more details on our experimental setup, including hyperparameter selection, see Appendix D. We randomly generate a spike u 2 Rd by first choosing a random support of size k, and then using random signs for each coordinate (uniformity is to make sure Condition (C2) is met). Then spike is scaled appropriately with to build the spiked covariance matrix of our normal distribution, from which we draw samples. We study how the performance of six algorithms vary over various values of k for fixed n and d. We compared SPCAv SLR with the following algorithms: diagnoal thresholding, which is a simple baseline; SPCA (ZHT [49]) is a fast heuristic also based on the regression idea; the truncated power method of [45], which is known for both strong theoretical guarantees and empirical performance; covariance thresholding, which has state-of-the-art theoretical guarantees. We modified each algorithm to return the top k most likely coordinates in the support (rather than thresholding based on a cutoff); for algorithms that compute a candidate eigenvector, we choose the top k coordinates largest in absolute value. We repeat for 100 trials, and plot the resulting empirical distribution for each statistic. |