Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing
Authors: Arun Jambulapati, Jerry Li, Kevin Tian
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We develop two methods for the following fundamental statistical task: given an ϵ-corrupted set of n samples from a d-dimensional sub-Gaussian distribution, return an approximate top eigenvector of the covariance matrix. Our first robust PCA algorithm runs in polynomial time, returns a 1 O(ϵ log ϵ 1)-approximate top eigenvector, and is based on a simple iterative filtering approach. Our second, which attains a slightly worse approximation factor, runs in nearly-linear time and sample complexity under a mild spectral gap assumption. These are the first polynomial-time algorithms yielding non-trivial information about the covariance of a corrupted sub-Gaussian distribution without requiring additional algebraic structure of moments. |
| Researcher Affiliation | Collaboration | Arun Jambulapati Stanford University EMAIL Jerry Li Microsoft Research EMAIL Kevin Tian Stanford University EMAIL |
| Pseudocode | Yes | Algorithm 1 Packing LP(A, ϵ) ... Algorithm 2 PNorm Packing(A, ϵ, p) ... Algorithm 3 Schatten Packing({Ai}i [n], ϵ, p) ... Algorithm 4 Robust PCA({Xi}i [n], ϵ, t) ... Algorithm 6 |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that open-source code for the described methodology is provided. |
| Open Datasets | No | The paper discusses theoretical properties of algorithms for sub-Gaussian distributions and does not specify the use of any named, publicly available datasets for training. |
| Dataset Splits | No | The paper is theoretical and does not describe training, validation, or test dataset splits for empirical reproduction. |
| Hardware Specification | No | The paper focuses on theoretical algorithms and their performance guarantees, and does not mention any specific hardware used for experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper does not provide details about experimental setup, such as hyperparameter values or training configurations, as it is a theoretical work. |