Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Geometric Analysis of PCA

Authors: Ayoub El Hanchi, Murat A Erdogdu, Chris J Maddison

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	What property of the data distribution determines the excess risk of principal component analysis? In this paper, we provide a precise answer to this question. We establish a central limit theorem for the error of the principal subspace estimated by PCA, and derive the asymptotic distribution of its excess risk under the reconstruction loss. We obtain a non-asymptotic upper bound on the excess risk of PCA that recovers, in the large sample limit, our asymptotic characterization. Underlying our contributions is the following result: we prove that the negative block Rayleigh quotient, defined on the Grassmannian, is generalized self-concordant along geodesics emanating from its minimizer of maximum rotation less than π/4.
Researcher Affiliation	Academia	Ayoub El Hanchi University of Toronto & Vector Institute EMAIL Murat A. Erdogdu University of Toronto & Vector Institute EMAIL Chris J. Maddison University of Toronto & Vector Institute EMAIL
Pseudocode	No	The paper describes theoretical analysis and mathematical derivations of PCA. There are no explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statements about releasing code or links to source code repositories. The Neur IPS checklist also indicates 'NA' for experiments and associated code.
Open Datasets	No	The paper discusses theoretical properties of PCA using generic 'i.i.d. data points (Xi)n i=1 in Rd' and the 'spiked covariance model' as an example. No specific datasets requiring public access information are mentioned or used for empirical evaluation. The Neur IPS checklist indicates 'NA' for experiments.
Dataset Splits	No	The paper does not describe any experiments involving datasets, thus no information on training/test/validation dataset splits is provided.
Hardware Specification	No	The paper is theoretical and does not conduct experiments, so there are no details provided regarding specific hardware specifications.
Software Dependencies	No	The paper is theoretical and does not involve experimental results, thus no specific software dependencies with version numbers are provided.
Experiment Setup	No	The paper is theoretical and does not involve any experimental setup or training, so no details on hyperparameters or specific configurations are provided.