Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Discrete Distribution Estimation under Local Privacy

Authors: Peter Kairouz, Keith Bonawitz, Daniel Ramage

ICML 2016 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Large scale simulations show that the optimal decoding algorithm for both k-RR and RAPPOR depends on the shape of the true underlying distribution. For skewed distributions, the projected estimator (introduced here) offers the best utility across a wide variety of privacy levels and sample sizes (Section 4.4).
Researcher Affiliation Collaboration Peter Kairouz EMAIL Keith Bonawitz EMAIL Daniel Ramage EMAIL Google, 1600 Amphitheatre Parkway, Mountain View, CA 94043, University of Illinois, Urbana-Champaign, 1308 W Main St, Urbana, IL 61801
Pseudocode No The paper references "Algorithm 1 of (Wang & Carreira-Perpi n an, 2013)" but does not contain structured pseudocode or algorithm blocks within its own text.
Open Source Code No The paper mentions that RAPPOR is an "open source Google technology" but does not state that the authors are releasing their own code for the methods described in this paper (k-RR, O-RR).
Open Datasets No The paper describes generating input data from various statistical distributions (e.g., "geometric distribution", "binomial distributions", "Zipf distribution", "multinomial distributions drawn from a symmetric Dirichlet distribution") for simulations, but does not refer to or provide access information for a publicly available or open dataset.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. It focuses on simulating data from distributions.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup Yes Free parameters are set via grid search over k [2, 4, 8, . . . , 2048, 4096], c [1, 2, 4, . . . , 512, 1024], h [1, 2, 4, 8, 16] for each ε.