Differentially Private Approximate Quantiles
Authors: Haim Kaplan, Shachar Schnapp, Uri Stemmer
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We give a worst case upper bound on its error, and show that its error is much lower than of previous implementations on several different datasets. Furthermore, it gets this low error while running time two orders of magnitude faster that the best previous implementation. ... We experimentally evaluated the AQ algorithm and conclude that it obtains higher accuracy and faster runtime than the existing state-of-the-art implementations (Section 4). |
| Researcher Affiliation | Collaboration | 1Tel Aviv University 2Google Research 3Ben-Gurion University. ... Haim Kaplan and Uri Stemmer are partially supported by the Israel Science Foundation (grants 1595/19 and 1871/19) and by Len Blavatnik and the Blavatnik Family foundation. Shachar Schnapp is partially supported by the Cyber Security Research Center at Ben-Gurion University of the Negev. |
| Pseudocode | Yes | Algorithm 1 Approximate Quantiles (AQ) input Domain D = (a, b), database X = (x1, . . . , xn) Dn, quantiles Q = (q1, . . . , qm), privacy parameter ε. |
| Open Source Code | Yes | We implemented the AQ algorithm in Python and its code is publicly available on Git Hub6. 6https://github.com/Shachar Schnapp/DP AQ |
| Open Datasets | Yes | Two real continuous datasets from Goodreads (2019), one contains book ratings and the other contains books page counts. Last we have two categorial datasets from the adults census data (Dua & Graff, 2019). ... Goodreads. Goodreads-books dataset, 2019. URL https://www.kaggle.com/jealousleopard/. Date accessed: 2021-07-01. ... Dua, D. and Graff, C. UCI machine learning repository, 2019. URL http://archive.ics.uci.edu/ml. Date accessed: 2021-07-01. |
| Dataset Splits | No | The paper mentions selecting '1000 samples from each dataset' for experiments but does not provide specific train/validation/test splits, percentages, or methodology for data partitioning. It describes the data sampling for empirical analysis rather than explicitly defining dataset splits. |
| Hardware Specification | Yes | each experiment used one core of an Intel i9-9900K processor. |
| Software Dependencies | No | The paper states 'We implemented the AQ algorithm in Python'. However, it does not specify the version of Python used, nor does it list any specific libraries (e.g., NumPy, SciPy, PyTorch) with their version numbers that are critical dependencies for replication. |
| Experiment Setup | Yes | We randomly chose 1000 samples from each dataset and checked the error of each algorithm with m = 1 to m = 120 uniform quantiles and D = [−100, 100] as a (loose) user-provided domain. We used the privacy parameter ε = 1. This process was repeated 100 times. ... All algorithms were ρ-z CDP for ρ = 1/8. For this we used ε = 1/√m in each application of the exponential mechanism by Appind Exp, and a Laplace noise of magnitude ε = 1/√h in each node of the tree computed by Agg Tree. In each application of the exponential mechanism by Approximate Quantiles we used ε = √1/(log m + 1) as described in Theorem 3.12. |