CoinPress: Practical Private Mean and Covariance Estimation

Authors: Sourav Biswas, Yihe Dong, Gautam Kamath, Jonathan Ullman

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our algorithms both theoretically and empirically using synthetic and real-world datasets showing that their asymptotic error rates match the state-of-the-art theoretical bounds, and that they concretely outperform all previous methods.
Researcher Affiliation Collaboration Sourav Biswas Cheriton School of Computer Science University of Waterloo s23biswa@uwaterloo.ca Yihe Dong Microsoft yihdong@microsoft.com Gautam Kamath Cheriton School of Computer Science University of Waterloo g@csail.mit.edu Jonathan Ullman Khoury College of Computer Sciences Northeastern University jullman@ccs.neu.edu
Pseudocode Yes Figure 2: Mean Estimation Algorithms
Open Source Code Yes Full version of the paper and code are available in the supplement, or alternatively at the following URLs: https://arxiv.org/abs/2006.06618 and https://github.com/ twistedcubic/coin-press.
Open Datasets Yes We generate a dataset of n samples from a d-dimensional Gaussian N(0, I), where we are promised the mean is bounded in ℓ2-distance by R. We revisit the classic genes mirror geography discovery of Novembre et al. [30]. We obtained a 20-dimensional version of the dataset (n = 1387) from the authors Git Hub.8 https://github.com/Novembre Lab/Novembre_etal_2008_misc
Dataset Splits No The paper does not provide specific training/validation/test dataset splits. It evaluates parameter estimators on generated samples or a given dataset rather than training a supervised model with distinct splits.
Hardware Specification Yes most of the plots (each involving about 1,000 runs of our algorithm) took only a few minutes to generate on a laptop computer with an Intel Core i7-7700HQ CPU. All experiments were completed within a few minutes on a laptop computer with an Intel Core i7-7700HQ CPU.
Software Dependencies No The paper mentions using its own implementation and the code from [14] for SYMQ, but it does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes Our algorithm has essentially four hyperparameters: choice of t, splitting the privacy budget, radius of the clipping ball, and radius of the confidence ball. We explore the role of t in our experiments. We found that assigning most of the privacy budget to the final iteration increased performance, namely 3ρ/4 goes to the final iteration and ρ/4(t 1) to each other. We run each method 100 times, and report the trimmed mean, with trimming parameter 0.1.