Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Geometric Analysis of PCA
Authors: Ayoub El Hanchi, Murat A Erdogdu, Chris J Maddison
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | What property of the data distribution determines the excess risk of principal component analysis? In this paper, we provide a precise answer to this question. We establish a central limit theorem for the error of the principal subspace estimated by PCA, and derive the asymptotic distribution of its excess risk under the reconstruction loss. We obtain a non-asymptotic upper bound on the excess risk of PCA that recovers, in the large sample limit, our asymptotic characterization. Underlying our contributions is the following result: we prove that the negative block Rayleigh quotient, defined on the Grassmannian, is generalized self-concordant along geodesics emanating from its minimizer of maximum rotation less than π/4. |
| Researcher Affiliation | Academia | Ayoub El Hanchi University of Toronto & Vector Institute EMAIL Murat A. Erdogdu University of Toronto & Vector Institute EMAIL Chris J. Maddison University of Toronto & Vector Institute EMAIL |
| Pseudocode | No | The paper describes theoretical analysis and mathematical derivations of PCA. There are no explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statements about releasing code or links to source code repositories. The Neur IPS checklist also indicates 'NA' for experiments and associated code. |
| Open Datasets | No | The paper discusses theoretical properties of PCA using generic 'i.i.d. data points (Xi)n i=1 in Rd' and the 'spiked covariance model' as an example. No specific datasets requiring public access information are mentioned or used for empirical evaluation. The Neur IPS checklist indicates 'NA' for experiments. |
| Dataset Splits | No | The paper does not describe any experiments involving datasets, thus no information on training/test/validation dataset splits is provided. |
| Hardware Specification | No | The paper is theoretical and does not conduct experiments, so there are no details provided regarding specific hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not involve experimental results, thus no specific software dependencies with version numbers are provided. |
| Experiment Setup | No | The paper is theoretical and does not involve any experimental setup or training, so no details on hyperparameters or specific configurations are provided. |