Covariance-Aware Private Mean Estimation Without Private Covariance Estimation

Authors: Gavin Brown, Marco Gaboardi, Adam Smith, Jonathan Ullman, Lydia Zakynthinou

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We present two sample-efficient differentially private mean estimators for ddimensional (sub)Gaussian distributions with unknown covariance. Informally, given n d/α2 samples from such a distribution with mean µ and covariance Σ, our estimators output µ such that µ µ Σ α, where Σ is the Mahalanobis distance. All previous estimators with the same guarantee either require strong a priori bounds on the covariance matrix or require Ω(d3/2) samples. Each of our estimators is based on a simple, general approach to designing differentially private mechanisms, but with novel technical steps to make the estimator private and sample-efficient. Our first estimator samples a point with approximately maximum Tukey depth using the exponential mechanism, but restricted to the set of points of large Tukey depth. Proving that this mechanism is private requires a novel analysis. Our second estimator perturbs the empirical mean of the data set with noise calibrated to the empirical covariance, without releasing the covariance itself. For both estimators, careful preprocessing of the data is required to satisfy differential privacy.
Researcher Affiliation Academia Gavin Brown Department of Computer Science Boston University grbrown@bu.edu Marco Gaboardi Department of Computer Science Boston University gaboardi@bu.edu Adam Smith Department of Computer Science Boston University ads22@bu.edu Jonathan Ullman Khoury College of Computer Sciences Northeastern University jullman@ccs.neu.edu Lydia Zakynthinou Khoury College of Computer Sciences Northeastern University zakynthinou.l@northeastern.edu
Pseudocode Yes Algorithm 1 Restricted Exponential Mechanism AE ε,δ,t(x) ... Algorithm 2 Empirically Rescaled Gaussian Mechanism AG ε,δ,β(x)
Open Source Code No The paper states "Our estimators are not computationally efficient. We provide finite implementations which construct and search over a sufficiently fine grid, resulting in running time exponential in the dimension d and sample size n." but does not explicitly state that the code is publicly available. Furthermore, in the author checklist, for question 3(a) "Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)?" the answer is "[N/A]".
Open Datasets No The paper is theoretical and focuses on algorithm design and theoretical guarantees for (sub)Gaussian distributions. It does not describe empirical experiments involving dataset training, validation, or testing. The author checklist explicitly states "[N/A]" for questions related to running experiments, including code, data, and training details.
Dataset Splits No The paper is theoretical and focuses on algorithm design and theoretical guarantees for (sub)Gaussian distributions. It does not describe empirical experiments involving dataset training, validation, or testing. The author checklist explicitly states "[N/A]" for questions related to running experiments, including training details.
Hardware Specification No The paper is theoretical and does not describe empirical experiments, so it does not specify any hardware used. The author checklist explicitly states "[N/A]" for questions related to computing resources for experiments.
Software Dependencies No The paper is theoretical and does not describe empirical experiments, so it does not list any specific software dependencies with version numbers. The author checklist explicitly states "[N/A]" for questions related to computing resources for experiments.
Experiment Setup No The paper is theoretical and focuses on algorithm design and theoretical guarantees. It does not describe empirical experiments or their setup, such as hyperparameter values or training configurations. The author checklist explicitly states "[N/A]" for questions related to training details for experiments.