DP-PCA: Statistically Optimal and Differentially Private PCA

Authors: Xiyang Liu, Weihao Kong, Prateek Jain, Sewoong Oh

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the canonical statistical task of computing the principal component from n i.i.d. data in d dimensions under (ε, δ)-differential privacy. Although extensively studied in literature, existing solutions fall short on two key aspects: (i) even for Gaussian data, existing private algorithms require the number of samples n to scale super-linearly with d, i.e., n = Ω(d3/2), to obtain non-trivial results while non-private PCA requires only n = O(d), and (ii) existing techniques suffer from a non-vanishing error even when the randomness in each data point is arbitrarily small. We propose DP-PCA, which is a single-pass algorithm that overcomes both limitations. It is based on a private minibatch gradient ascent method that relies on private mean estimation to add minimal noise required to ensure privacy by adapting to the geometry of a given minibatch of gradients. For sub-Gaussian data, we provide nearly optimal statistical error rates even for n = O(d). Furthermore, we provide a lower bound showing that sub-Gaussian style assumption is necessary in obtaining the optimal error rate. We don t have experiments
Researcher Affiliation Collaboration Xiyang Liu Paul Allen School of Computer Science & Engineering University of Washington xiyangl@cs.washington.edu Weihao Kong Google Research weihaokong@google.com Prateek Jain Google Research prajain@google.com Sewoong Oh Paul Allen School of Computer Science & Engineering University of Washington, and Google Research sewoong@cs.washington.edu
Pseudocode Yes Algorithm 1: (Non-private) Oja s Algorithm, Algorithm 2: Private Oja s Algorithm, Algorithm 3: Differentially Private Principal Component Analysis (DP-PCA), Algorithm 4, Algorithm 5
Open Source Code No The paper does not contain any statement about releasing open-source code for the methodology described. The self-reflection states "We don t have experiments" and implies no code release.
Open Datasets No The paper focuses on theoretical analysis of data sampled from distributions (e.g., "n i.i.d. data", "Gaussian data", "sub-Gaussian data") and does not refer to specific publicly available or open datasets with access information.
Dataset Splits No The paper is theoretical and does not conduct experiments, therefore it does not provide details on training, validation, or test dataset splits.
Hardware Specification No The paper is theoretical and does not conduct experiments, thus no specific hardware details are mentioned for running experiments. The self-reflection states "We don t have experiments".
Software Dependencies No The paper is theoretical and does not conduct experiments, therefore no specific software dependencies with version numbers are provided. The self-reflection states "We don t have experiments".
Experiment Setup No The paper is theoretical and does not conduct experiments, therefore it does not describe specific experimental setup details like hyperparameters or training configurations. The self-reflection states "We don t have experiments".