Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Spectral Perturbation Bounds for Low-Rank Approximation with Applications to Privacy

Authors: Phuc Tran, Van Vu, Nisheeth K. Vishnoi

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results on real-world datasets confirm that our bounds closely track the actual spectral error under diverse perturbation regimes. 4 Empirical results In this section, we empirically evaluate the sharpness of our spectral-gap bound (Theorem 2.1) in real-world settings central to privacy-preserving low-rank approximation. We compare: (1) the actual spectral error Ap Ap , (2) our theoretical bound3 7 E λp δp , (3) and the classical Eckart Young Mirsky (EYM) bound 2( E + λp+1). Each quantity is computed over 100 trials and 20 noise levels.
Researcher Affiliation Academia Phuc Tran Vin University Nisheeth K. Vishnoi Yale University Van H. Vu Yale University Correspondence to EMAIL.
Pseudocode No The paper describes methods and proof outlines but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code and instructions are provided in the supplemental material.
Open Datasets Yes We study three covariance matrices A from the UCI Machine Learning Repository [13]: the 1990 US Census (n = 69), the 1998 KDD-Cup network-intrusion data (n = 416), and the Adult dataset (n = 6). These matrices henceforth Census, KDD, and Adult are standard benchmarks in DP PCA [3, 11, 29]. [13] Dheeru Dua and Casey Graff. UCI machine learning repository. https://archive.ics. uci.edu/ml, 2017.
Dataset Splits No The paper describes experimental conditions such as noise levels and number of trials (e.g., "20 noise levels" and "100 independent trials") but does not mention traditional training/test/validation splits for machine learning datasets. The experiments involve perturbing matrices and evaluating bounds, not typical supervised or unsupervised learning tasks requiring dataset splits.
Hardware Specification No The experiments are lightweight and run on standard CPU machines; resource requirements are described in the supplemental material.
Software Dependencies No The paper mentions that code and instructions are provided in the supplemental material but does not list specific software dependencies with version numbers in the main text.
Experiment Setup Yes Each matrix is perturbed with either GOE noise E1 or Rademacher noise E2, scaled by twenty evenly spaced factors ranging from 0 to 1. For each configuration (A, Ek, p), we run 100 independent trials. Each data matrix is preprocessed as follows: non-numeric entries are replaced with 0; rows shorter than the maximum length are padded with zeros; each row is scaled to unit Euclidean norm; and each column is centered to have zero mean. We compute the covariance matrix A := M M, where M is the processed data matrix.