Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy
Authors: Kareem Amin, Alex Kulesza, Andres Munoz, Sergei Vassilvtiskii
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1. Error when privately estimating the number of ratings in the Movie Lens dataset using different truncation strategies, averaged over 1000 runs. |
| Researcher Affiliation | Industry | 1Google Research New York, NY, USA. Correspondence to: Kareem Amin <kamin@google.com>, Alex Kulesza <kulesza@google.com>, Andres Mu noz Medina <ammedina@google.com>, Sergei Vassilvitskii <sergeiv@google.com>. |
| Pseudocode | Yes | Algorithm 1 Contribution-bounded ERM. Input: Dataset S = (z1, . . . , zn) drawn from D, contribution fraction 0 > 0, privacy parameter > 0. Construct S where = 0n. Run -differentially private ERM on S to obtain hpriv. return hpriv. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | Example. The Movie Lens 20M dataset consists of 20 million movie ratings from 138 thousand users. 2https://grouplens.org/datasets/movielens/ |
| Dataset Splits | No | The paper refers to empirical risk minimization and general learning algorithms, but does not provide specific details on training, validation, or test dataset splits used for its own experiments, such as percentages, sample counts, or splitting methodology. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory configurations used for running experiments. |
| Software Dependencies | No | The paper does not mention any specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their versions) that would allow for replication of the experimental setup. |
| Experiment Setup | Yes | Figure 1. Error when privately estimating the number of ratings in the Movie Lens dataset using different truncation strategies, averaged over 1000 runs. We compare three methods of contribution bounding, followed by the Laplace mechanism: (1) truncation of {xi} to the (1 1/n )-quantile, as suggested by our analysis, (2) the commonly used truncation at the median, and (3) truncation at the 95% quantile. |