Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA

Authors: Ilias Diakonikolas, Daniel Kane, Ankit Pensia, Thanasis Pittas

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our main contribution is to develop a nearly linear time algorithm for robust PCA with near-optimal error guarantees. We also develop a single-pass streaming algorithm for robust PCA with memory usage nearly-linear in the dimension. ... Theorem 1.2. Let ϵ (0, c) for a small constant c > 0. Let D be a subgaussian distribution with mean zero and (unknown) covariance matrix Σ. ... There exists an algorithm that takes as inputs S and ϵ, runs in e O(nd/ϵ2) time and with probability at least 0.99 outputs a unit vector u such that u Σu (1 O(ϵ log(1/ϵ))) Σ op.
Researcher Affiliation Academia 1University of Wisconsin-Madison 2University of California, San Diego. Correspondence to: Ankit Pensia <ankitp@cs.wisc.edu>, Thanasis Pittas <pittas@wisc.edu>.
Pseudocode Yes Algorithm 1 HARDTHRESHOLDINGFILTER; Algorithm 2 SAMPLETOPEIGENVECTOR; Algorithm 3 Robust PCA in Small Number of Iterations; Algorithm 4 HARDTHRESHOLDINGFILTER; Algorithm 5 ROBUSTPCA with improved runtime; Algorithm 6 Estimator of Bpk,t from minibatches; Algorithm 7 ROBUSTPCA in streaming model; Algorithm 8 SAMPLETOPEIGENVECTOR 2.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets No The paper focuses on theoretical analysis of algorithms for robust PCA, using abstract statistical models and distributions (e.g., 'subgaussian distribution', 'ϵ-corrupted set of samples', 'P = (1 ϵ)G + ϵB') rather than specific, publicly available datasets for empirical evaluation.
Dataset Splits No The paper is theoretical and does not describe empirical experiments with dataset splits.
Hardware Specification No The paper is theoretical and does not describe empirical experiments or the specific hardware used.
Software Dependencies No The paper is theoretical and does not describe empirical experiments or specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe an experimental setup or hyperparameter details for empirical evaluation.