Differentially Private Quantiles

Authors: Jennifer Gillenwater, Matthew Joseph, Alex Kulesza

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now empirically evaluate Joint Exp against three alternatives: App Ind Exp, CSmooth, and Agg Tree. We evaluate our four algorithms on four datasets: synthetic Gaussian data from N(0, 5), synthetic uniform data from U( 5, 5), and real collections of book ratings and page counts from Goodreads (Soumik, 2019) (Figure 2).
Researcher Affiliation Industry Equal contributions, all authors at Google Research New York. Correspondence to: Jennifer Gillenwater <jengi@google.com>, Matthew Joseph <mtjoseph@google.com>, Alex Kulesza <kulesza@google.com>.
Pseudocode Yes Algorithm 1 Pseudocode for Joint Exp
Open Source Code Yes All experiment code is publicly available (Google, 2021). Google. dp multiq. https://github.com/google-research/google-research/tree/master/dp_multiq, 2021.
Open Datasets Yes We evaluate our four algorithms on four datasets: synthetic Gaussian data from N(0, 5), synthetic uniform data from U( 5, 5), and real collections of book ratings and page counts from Goodreads (Soumik, 2019) (Figure 2).
Dataset Splits No No specific dataset split information (percentages, counts, or predefined splits) for training, validation, or testing was found. The paper mentions '20 trials of 1000 random samples'.
Hardware Specification Yes All experiments were run on a machine with two CPU cores and 100GB RAM.
Software Dependencies No The paper mentions 'scipy.special.logsumexp' and refers to a 'racing sampling method' and numerical improvements, but does not provide specific version numbers for software dependencies like Python or SciPy itself.
Experiment Setup Yes In each case, the requested quantiles are evenly spaced. m = 1 is median estimation, m = 2 requires estimating the 33rd and 67th percentiles, and so on. We average scores across 20 trials of 1000 random samples. For every experiment, we take [ 100, 100] as the (loose) user-provided data range. For the Goodreads page numbers dataset, we also divide each value by 100 to scale the values to [ 100, 100]. Experiments for ε = 1 appear in Figure 3.