Differentially Private Quantiles
Authors: Jennifer Gillenwater, Matthew Joseph, Alex Kulesza
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now empirically evaluate Joint Exp against three alternatives: App Ind Exp, CSmooth, and Agg Tree. We evaluate our four algorithms on four datasets: synthetic Gaussian data from N(0, 5), synthetic uniform data from U( 5, 5), and real collections of book ratings and page counts from Goodreads (Soumik, 2019) (Figure 2). |
| Researcher Affiliation | Industry | Equal contributions, all authors at Google Research New York. Correspondence to: Jennifer Gillenwater <jengi@google.com>, Matthew Joseph <mtjoseph@google.com>, Alex Kulesza <kulesza@google.com>. |
| Pseudocode | Yes | Algorithm 1 Pseudocode for Joint Exp |
| Open Source Code | Yes | All experiment code is publicly available (Google, 2021). Google. dp multiq. https://github.com/google-research/google-research/tree/master/dp_multiq, 2021. |
| Open Datasets | Yes | We evaluate our four algorithms on four datasets: synthetic Gaussian data from N(0, 5), synthetic uniform data from U( 5, 5), and real collections of book ratings and page counts from Goodreads (Soumik, 2019) (Figure 2). |
| Dataset Splits | No | No specific dataset split information (percentages, counts, or predefined splits) for training, validation, or testing was found. The paper mentions '20 trials of 1000 random samples'. |
| Hardware Specification | Yes | All experiments were run on a machine with two CPU cores and 100GB RAM. |
| Software Dependencies | No | The paper mentions 'scipy.special.logsumexp' and refers to a 'racing sampling method' and numerical improvements, but does not provide specific version numbers for software dependencies like Python or SciPy itself. |
| Experiment Setup | Yes | In each case, the requested quantiles are evenly spaced. m = 1 is median estimation, m = 2 requires estimating the 33rd and 67th percentiles, and so on. We average scores across 20 trials of 1000 random samples. For every experiment, we take [ 100, 100] as the (loose) user-provided data range. For the Goodreads page numbers dataset, we also divide each value by 100 to scale the values to [ 100, 100]. Experiments for ε = 1 appear in Figure 3. |