DP-PCA: Statistically Optimal and Differentially Private PCA
Authors: Xiyang Liu, Weihao Kong, Prateek Jain, Sewoong Oh
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the canonical statistical task of computing the principal component from n i.i.d. data in d dimensions under (ε, δ)-differential privacy. Although extensively studied in literature, existing solutions fall short on two key aspects: (i) even for Gaussian data, existing private algorithms require the number of samples n to scale super-linearly with d, i.e., n = Ω(d3/2), to obtain non-trivial results while non-private PCA requires only n = O(d), and (ii) existing techniques suffer from a non-vanishing error even when the randomness in each data point is arbitrarily small. We propose DP-PCA, which is a single-pass algorithm that overcomes both limitations. It is based on a private minibatch gradient ascent method that relies on private mean estimation to add minimal noise required to ensure privacy by adapting to the geometry of a given minibatch of gradients. For sub-Gaussian data, we provide nearly optimal statistical error rates even for n = O(d). Furthermore, we provide a lower bound showing that sub-Gaussian style assumption is necessary in obtaining the optimal error rate. We don t have experiments |
| Researcher Affiliation | Collaboration | Xiyang Liu Paul Allen School of Computer Science & Engineering University of Washington xiyangl@cs.washington.edu Weihao Kong Google Research weihaokong@google.com Prateek Jain Google Research prajain@google.com Sewoong Oh Paul Allen School of Computer Science & Engineering University of Washington, and Google Research sewoong@cs.washington.edu |
| Pseudocode | Yes | Algorithm 1: (Non-private) Oja s Algorithm, Algorithm 2: Private Oja s Algorithm, Algorithm 3: Differentially Private Principal Component Analysis (DP-PCA), Algorithm 4, Algorithm 5 |
| Open Source Code | No | The paper does not contain any statement about releasing open-source code for the methodology described. The self-reflection states "We don t have experiments" and implies no code release. |
| Open Datasets | No | The paper focuses on theoretical analysis of data sampled from distributions (e.g., "n i.i.d. data", "Gaussian data", "sub-Gaussian data") and does not refer to specific publicly available or open datasets with access information. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments, therefore it does not provide details on training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not conduct experiments, thus no specific hardware details are mentioned for running experiments. The self-reflection states "We don t have experiments". |
| Software Dependencies | No | The paper is theoretical and does not conduct experiments, therefore no specific software dependencies with version numbers are provided. The self-reflection states "We don t have experiments". |
| Experiment Setup | No | The paper is theoretical and does not conduct experiments, therefore it does not describe specific experimental setup details like hyperparameters or training configurations. The self-reflection states "We don t have experiments". |