reproducibilityindex.ai

Sparse PCA from Sparse Linear Regression

Authors: Guy Bresler, Sung Min Park, Madalina Persu

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide experimental results on simulated data comparing our proposed framework to other algorithms for SPCA. We test our algorithmic framework on randomly generated synthetic data and compare to other existing algorithms for SPCA.
Researcher Affiliation	Collaboration	Guy Bresler MIT guy@mit.edu Sung Min Park MIT sp765@mit.edu M ad alina Persu Two Sigma , MIT mpersu@mit.edu
Pseudocode	Yes	Algorithm 1 Q-hypothesis testing Input: X 2 Rd n, k Output: {0, 1} for i = 1, . . . , d do bβi = SLR(Xi, X i, k) Qi = 1 nk Xi X i bβik2 2 if Qi > 13k log d k n then return 1 end if end for Return 0 Algorithm 2 Q-support recovery Input: X 2 Rd n, k b S = ? for i = 1, . . . , d do bβi = SLR(Xi, X i, k) Qi = 1 nk Xi X i bβik2 2 if Qi > 13k log d k n then b S b S [ {i} end if end for Return b S
Open Source Code	No	The paper does not provide an explicit link to open-source code or state that code is made available.
Open Datasets	No	We test our algorithmic framework on randomly generated synthetic data and compare to other existing algorithms for SPCA. The paper states that data is 'randomly generated synthetic data' and does not provide access information for a public dataset.
Dataset Splits	No	The paper mentions generating data for experiments but does not specify any training, validation, or test dataset splits.
Hardware Specification	No	The paper mentions the use of 'standard libraries' and 'Python' for implementation, but it does not specify any hardware details like GPU/CPU models or memory.
Software Dependencies	No	The code was implemented in Python using standard libraries. The paper mentions Python and standard libraries but does not provide specific version numbers for any software components.
Experiment Setup	Yes	For more details on our experimental setup, including hyperparameter selection, see Appendix D. We randomly generate a spike u 2 Rd by ﬁrst choosing a random support of size k, and then using random signs for each coordinate (uniformity is to make sure Condition (C2) is met). Then spike is scaled appropriately with to build the spiked covariance matrix of our normal distribution, from which we draw samples. We study how the performance of six algorithms vary over various values of k for ﬁxed n and d. We compared SPCAv SLR with the following algorithms: diagnoal thresholding, which is a simple baseline; SPCA (ZHT [49]) is a fast heuristic also based on the regression idea; the truncated power method of [45], which is known for both strong theoretical guarantees and empirical performance; covariance thresholding, which has state-of-the-art theoretical guarantees. We modiﬁed each algorithm to return the top k most likely coordinates in the support (rather than thresholding based on a cutoff); for algorithms that compute a candidate eigenvector, we choose the top k coordinates largest in absolute value. We repeat for 100 trials, and plot the resulting empirical distribution for each statistic.