Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing

Authors: Arun Jambulapati, Jerry Li, Kevin Tian

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We develop two methods for the following fundamental statistical task: given an ϵ-corrupted set of n samples from a d-dimensional sub-Gaussian distribution, return an approximate top eigenvector of the covariance matrix. Our first robust PCA algorithm runs in polynomial time, returns a 1 O(ϵ log ϵ 1)-approximate top eigenvector, and is based on a simple iterative filtering approach. Our second, which attains a slightly worse approximation factor, runs in nearly-linear time and sample complexity under a mild spectral gap assumption. These are the first polynomial-time algorithms yielding non-trivial information about the covariance of a corrupted sub-Gaussian distribution without requiring additional algebraic structure of moments.
Researcher Affiliation Collaboration Arun Jambulapati Stanford University jmblpati@stanford.edu Jerry Li Microsoft Research jerrl@microsoft.com Kevin Tian Stanford University kjtian@stanford.edu
Pseudocode Yes Algorithm 1 Packing LP(A, ϵ) ... Algorithm 2 PNorm Packing(A, ϵ, p) ... Algorithm 3 Schatten Packing({Ai}i [n], ϵ, p) ... Algorithm 4 Robust PCA({Xi}i [n], ϵ, t) ... Algorithm 6
Open Source Code No The paper does not contain any explicit statements or links indicating that open-source code for the described methodology is provided.
Open Datasets No The paper discusses theoretical properties of algorithms for sub-Gaussian distributions and does not specify the use of any named, publicly available datasets for training.
Dataset Splits No The paper is theoretical and does not describe training, validation, or test dataset splits for empirical reproduction.
Hardware Specification No The paper focuses on theoretical algorithms and their performance guarantees, and does not mention any specific hardware used for experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup No The paper does not provide details about experimental setup, such as hyperparameter values or training configurations, as it is a theoretical work.