Practical Locally Private Heavy Hitters

Authors: Raef Bassily, Kobbi Nissim, Uri Stemmer, Abhradeep Guha Thakurta

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implemented Algorithm Tree Hist to verify our theoretical analysis and compared its performance with the performance of Google s RAPPOR code. Our measurements are performed with a setting that is favorable to RAPPOR (i.e., a small input domain), yet they indicate that Algorithm Tree Hist performs better than RAPPOR in terms of noise level.
Researcher Affiliation Academia Department of Computer Science & Engineering, The Ohio State University. bassily.1@osu.edu Department of Computer Science, Georgetown University. kobbi.nissim@georgetown.edu Center for Research on Computation and Society (CRCS), Harvard University. u@uri.co.il Department of Computer Science, University of California Santa Cruz. aguhatha@ucsc.edu.
Pseudocode No The paper describes the protocols and algorithms in detailed text sections (e.g., 'The Tree Hist Protocol', 'A local randomizer: Local Rnd', 'A frequency oracle: Freq Oracle') but does not include formal pseudocode blocks or sections explicitly labeled 'Algorithm'.
Open Source Code No The paper mentions comparing their implementation with 'the only publicly available code for locally private frequency estimation' (RAPPOR) and provides its GitHub link (https://github.com/google/rappor). However, it does not state that the code for their own algorithms (Tree Hist, Bitstogram) is publicly available or provide a link to it.
Open Datasets Yes We ran the experiment (Figure 1) on a data set drawn uniformly at random from the NLTK Brown corpus [1]. [1] Nltk brown corpus. www.nltk.org.
Dataset Splits No The paper does not provide specific dataset split information (e.g., exact percentages, sample counts, or citations to predefined splits) for training, validation, or testing, beyond mentioning the NLTK Brown corpus.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments, only abstract concepts like 'server running time' and 'user running time'.
Software Dependencies No The paper mentions comparing with 'RAPPOR code base' and its snapshot date, but it does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks) used for their own implementation.
Experiment Setup Yes The data set we created has n = 10 million samples drawn i.i.d. from the corpus with replacement (which corresponds to 25, 991 unique words), and the system parameters are chosen as follows: number of data samples (n) : 10 million, range of the hash function (m): n, number of hash functions (t): 285. For the hash functions, we used the prefix bits of SHA-256. [...] with the same privacy parameter ϵ = ln(3), the number of data samples n = 1 million, and the data set to be the same data set generated by the demo.sh script. [...] We set a threshold of 15 n as the threshold for being a heavy hitter.