reproducibilityindex.ai

Differentially Private Approximate Quantiles

Authors: Haim Kaplan, Shachar Schnapp, Uri Stemmer

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We give a worst case upper bound on its error, and show that its error is much lower than of previous implementations on several different datasets. Furthermore, it gets this low error while running time two orders of magnitude faster that the best previous implementation. ... We experimentally evaluated the AQ algorithm and conclude that it obtains higher accuracy and faster runtime than the existing state-of-the-art implementations (Section 4).
Researcher Affiliation	Collaboration	1Tel Aviv University 2Google Research 3Ben-Gurion University. ... Haim Kaplan and Uri Stemmer are partially supported by the Israel Science Foundation (grants 1595/19 and 1871/19) and by Len Blavatnik and the Blavatnik Family foundation. Shachar Schnapp is partially supported by the Cyber Security Research Center at Ben-Gurion University of the Negev.
Pseudocode	Yes	Algorithm 1 Approximate Quantiles (AQ) input Domain D = (a, b), database X = (x1, . . . , xn) Dn, quantiles Q = (q1, . . . , qm), privacy parameter ε.
Open Source Code	Yes	We implemented the AQ algorithm in Python and its code is publicly available on Git Hub6. 6https://github.com/Shachar Schnapp/DP AQ
Open Datasets	Yes	Two real continuous datasets from Goodreads (2019), one contains book ratings and the other contains books page counts. Last we have two categorial datasets from the adults census data (Dua & Graff, 2019). ... Goodreads. Goodreads-books dataset, 2019. URL https://www.kaggle.com/jealousleopard/. Date accessed: 2021-07-01. ... Dua, D. and Graff, C. UCI machine learning repository, 2019. URL http://archive.ics.uci.edu/ml. Date accessed: 2021-07-01.
Dataset Splits	No	The paper mentions selecting '1000 samples from each dataset' for experiments but does not provide specific train/validation/test splits, percentages, or methodology for data partitioning. It describes the data sampling for empirical analysis rather than explicitly defining dataset splits.
Hardware Specification	Yes	each experiment used one core of an Intel i9-9900K processor.
Software Dependencies	No	The paper states 'We implemented the AQ algorithm in Python'. However, it does not specify the version of Python used, nor does it list any specific libraries (e.g., NumPy, SciPy, PyTorch) with their version numbers that are critical dependencies for replication.
Experiment Setup	Yes	We randomly chose 1000 samples from each dataset and checked the error of each algorithm with m = 1 to m = 120 uniform quantiles and D = [−100, 100] as a (loose) user-provided domain. We used the privacy parameter ε = 1. This process was repeated 100 times. ... All algorithms were ρ-z CDP for ρ = 1/8. For this we used ε = 1/√m in each application of the exponential mechanism by Appind Exp, and a Laplace noise of magnitude ε = 1/√h in each node of the tree computed by Agg Tree. In each application of the exponential mechanism by Approximate Quantiles we used ε = √1/(log m + 1) as described in Theorem 3.12.