CoinPress: Practical Private Mean and Covariance Estimation
Authors: Sourav Biswas, Yihe Dong, Gautam Kamath, Jonathan Ullman
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our algorithms both theoretically and empirically using synthetic and real-world datasets showing that their asymptotic error rates match the state-of-the-art theoretical bounds, and that they concretely outperform all previous methods. |
| Researcher Affiliation | Collaboration | Sourav Biswas Cheriton School of Computer Science University of Waterloo s23biswa@uwaterloo.ca Yihe Dong Microsoft yihdong@microsoft.com Gautam Kamath Cheriton School of Computer Science University of Waterloo g@csail.mit.edu Jonathan Ullman Khoury College of Computer Sciences Northeastern University jullman@ccs.neu.edu |
| Pseudocode | Yes | Figure 2: Mean Estimation Algorithms |
| Open Source Code | Yes | Full version of the paper and code are available in the supplement, or alternatively at the following URLs: https://arxiv.org/abs/2006.06618 and https://github.com/ twistedcubic/coin-press. |
| Open Datasets | Yes | We generate a dataset of n samples from a d-dimensional Gaussian N(0, I), where we are promised the mean is bounded in ℓ2-distance by R. We revisit the classic genes mirror geography discovery of Novembre et al. [30]. We obtained a 20-dimensional version of the dataset (n = 1387) from the authors Git Hub.8 https://github.com/Novembre Lab/Novembre_etal_2008_misc |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits. It evaluates parameter estimators on generated samples or a given dataset rather than training a supervised model with distinct splits. |
| Hardware Specification | Yes | most of the plots (each involving about 1,000 runs of our algorithm) took only a few minutes to generate on a laptop computer with an Intel Core i7-7700HQ CPU. All experiments were completed within a few minutes on a laptop computer with an Intel Core i7-7700HQ CPU. |
| Software Dependencies | No | The paper mentions using its own implementation and the code from [14] for SYMQ, but it does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | Our algorithm has essentially four hyperparameters: choice of t, splitting the privacy budget, radius of the clipping ball, and radius of the confidence ball. We explore the role of t in our experiments. We found that assigning most of the privacy budget to the final iteration increased performance, namely 3ρ/4 goes to the final iteration and ρ/4(t 1) to each other. We run each method 100 times, and report the trimmed mean, with trimming parameter 0.1. |