reproducibilityindex.ai

Sparse PCA via Bipartite Matchings

Authors: Megasthenis Asteris, Dimitris Papailiopoulos, Anastasios Kyrillidis, Alexandros G. Dimakis

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our algorithm on real datasets and empirically demonstrate that in many cases it outperforms existing, deﬂation-based approaches.
Researcher Affiliation	Academia	Megasthenis Asteris The University of Texas at Austin megas@utexas.edu Dimitris Papailiopoulos University of California, Berkeley dimitrisp@berkeley.edu Anastasios Kyrillidis The University of Texas at Austin anastasios@utexas.edu Alexandros G. Dimakis The University of Texas at Austin dimakis@austin.utexas.edu
Pseudocode	Yes	Algorithm 1 Sparse PCA (Multiple disjoint components) input : PSD d d rank-r matrix A, ǫ (0, 1), k Z+. output : X Xk {Theorem 1} 1: C {} 2: [U, Λ] EIG(A) 3: for each C [Nǫ/2(Sr 1 2 )] k do 4: W UΛ1/2C {W Rd k} 5: b X arg max X Xk Pk j=1 Xj, Wj 2 {Alg. 2} 7: end for 8: X arg max X C TR X AX
Open Source Code	No	The paper states, 'Our experiments are conducted in a Matlab environment. Due to its nature, our algorithm is easily parallelizable; its prototypical implementation utilizes the Parallel Pool Matlab feature to exploit multicore (or distributed cluster) capabilities.' However, it does not provide any specific links or explicit statements about code availability.
Open Datasets	Yes	Leukemia Dataset. We evaluate our algorithm on the Leukemia dataset [31]. ... Additional Datasets. We repeat the experiment on multiple datasets, arbitrarily selected from [31]. ... Bag of Words (BoW) Dataset. [31] ... [31] M. Lichman, UCI machine learning repository, 2013.
Dataset Splits	No	The paper mentions using various datasets (e.g., Leukemia, Bag of Words) but does not provide specific details on how these datasets were split into training, validation, or test sets, nor does it refer to standard splits with citations. It mentions 'PEMS TRAIN' and 'ARCENE TRAIN' but these are names of datasets, not descriptions of a training split.
Hardware Specification	No	The paper states, 'Our experiments are conducted in a Matlab environment. Due to its nature, our algorithm is easily parallelizable; its prototypical implementation utilizes the Parallel Pool Matlab feature to exploit multicore (or distributed cluster) capabilities.' However, it does not provide specific details on the CPU, GPU models, memory, or other hardware specifications used.
Software Dependencies	No	The paper states that 'Our experiments are conducted in a Matlab environment' but does not specify a version number for Matlab or any other software dependencies with version information.
Experiment Setup	Yes	Unless otherwise speciﬁed, it is conﬁgured for a rank-4 approximation obtained via truncated SVD. ... We extract k = 5 sparse components, each active on s = 50 features. ... For algorithms that are randomly initialized, we depict best results over multiple random restarts. ... We set a barrier on the execution time of our algorithm at the cost of the theoretical approximation guarantees; the algorithm returns the best result at the time of termination.