Sub-exponential time Sum-of-Squares lower bounds for Principal Components Analysis

Authors: Aaron Potechin, Goutham Rajendran

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we study the limits of the powerful Sum of Squares (So S) family of algorithms for Sparse PCA. In this work, we show that So S algorithms cannot beat known spectral algorithms, even if we allow sub-exponential time! Therefore, this suggests that currently used algorithms such as thresholding or other spectral algorithms are in a sense optimal for this problem. Moreover, our techniques are strong enough to obtain similar tradeoffs for Tensor PCA, another important higher order variant of PCA with applications in topic modeling, video processing, etc. Our main contributions can be summarized as follows 1. Despite the huge breakthroughs achieved by Sum-of-Squares algorithms in recent works on high dimensional statistics, we show barriers to it for the fundamental problems of Sparse PCA and Tensor PCA. 3. Prior lower bounds for these problems have either focused on weaker classes of algorithms or were obtained assuming other hardness conjectures, whereas we prove high degree sub-exponential time So S lower bounds without relying on any conjectures. If you are including theoretical results... Did you include complete proofs of all theoretical results? [Yes] See the supplementary. If you ran experiments... [N/A]
Researcher Affiliation Academia Aaron Potechin University of Chicago potechin@uchicago.edu Goutham Rajendran University of Chicago goutham@uchicago.edu
Pseudocode No The paper focuses on theoretical lower bounds and does not present pseudocode or algorithm blocks.
Open Source Code No Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A]
Open Datasets No The paper uses theoretical models like the 'Wishart random model of Sparse PCA' and 'Tensor PCA' rather than concrete, publicly available datasets for training. There is no mention of a dataset used for training that is publicly available. 'If you ran experiments... [N/A]'
Dataset Splits No The paper is theoretical and does not involve experimental runs with data splits. Therefore, no training/test/validation dataset splits are provided. 'If you ran experiments... [N/A]'
Hardware Specification No The paper is theoretical and does not describe any experimental hardware. 'If you ran experiments... [N/A]'
Software Dependencies No The paper is theoretical and does not describe any specific software dependencies with version numbers for experimental reproducibility. 'If you ran experiments... [N/A]'
Experiment Setup No The paper is theoretical and does not describe any experimental setup details such as hyperparameters or training configurations. 'If you ran experiments... [N/A]'