Sparse PCA via Bipartite Matchings

Authors: Megasthenis Asteris, Dimitris Papailiopoulos, Anastasios Kyrillidis, Alexandros G. Dimakis

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our algorithm on real datasets and empirically demonstrate that in many cases it outperforms existing, deflation-based approaches.
Researcher Affiliation Academia Megasthenis Asteris The University of Texas at Austin megas@utexas.edu Dimitris Papailiopoulos University of California, Berkeley dimitrisp@berkeley.edu Anastasios Kyrillidis The University of Texas at Austin anastasios@utexas.edu Alexandros G. Dimakis The University of Texas at Austin dimakis@austin.utexas.edu
Pseudocode Yes Algorithm 1 Sparse PCA (Multiple disjoint components) input : PSD d d rank-r matrix A, ǫ (0, 1), k Z+. output : X Xk {Theorem 1} 1: C {} 2: [U, Λ] EIG(A) 3: for each C [Nǫ/2(Sr 1 2 )] k do 4: W UΛ1/2C {W Rd k} 5: b X arg max X Xk Pk j=1 Xj, Wj 2 {Alg. 2} 7: end for 8: X arg max X C TR X AX
Open Source Code No The paper states, 'Our experiments are conducted in a Matlab environment. Due to its nature, our algorithm is easily parallelizable; its prototypical implementation utilizes the Parallel Pool Matlab feature to exploit multicore (or distributed cluster) capabilities.' However, it does not provide any specific links or explicit statements about code availability.
Open Datasets Yes Leukemia Dataset. We evaluate our algorithm on the Leukemia dataset [31]. ... Additional Datasets. We repeat the experiment on multiple datasets, arbitrarily selected from [31]. ... Bag of Words (BoW) Dataset. [31] ... [31] M. Lichman, UCI machine learning repository, 2013.
Dataset Splits No The paper mentions using various datasets (e.g., Leukemia, Bag of Words) but does not provide specific details on how these datasets were split into training, validation, or test sets, nor does it refer to standard splits with citations. It mentions 'PEMS TRAIN' and 'ARCENE TRAIN' but these are names of datasets, not descriptions of a training split.
Hardware Specification No The paper states, 'Our experiments are conducted in a Matlab environment. Due to its nature, our algorithm is easily parallelizable; its prototypical implementation utilizes the Parallel Pool Matlab feature to exploit multicore (or distributed cluster) capabilities.' However, it does not provide specific details on the CPU, GPU models, memory, or other hardware specifications used.
Software Dependencies No The paper states that 'Our experiments are conducted in a Matlab environment' but does not specify a version number for Matlab or any other software dependencies with version information.
Experiment Setup Yes Unless otherwise specified, it is configured for a rank-4 approximation obtained via truncated SVD. ... We extract k = 5 sparse components, each active on s = 50 features. ... For algorithms that are randomly initialized, we depict best results over multiple random restarts. ... We set a barrier on the execution time of our algorithm at the cost of the theoretical approximation guarantees; the algorithm returns the best result at the time of termination.