Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Discrete Distribution Estimation under Local Privacy
Authors: Peter Kairouz, Keith Bonawitz, Daniel Ramage
ICML 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Large scale simulations show that the optimal decoding algorithm for both k-RR and RAPPOR depends on the shape of the true underlying distribution. For skewed distributions, the projected estimator (introduced here) offers the best utility across a wide variety of privacy levels and sample sizes (Section 4.4). |
| Researcher Affiliation | Collaboration | Peter Kairouz EMAIL Keith Bonawitz EMAIL Daniel Ramage EMAIL Google, 1600 Amphitheatre Parkway, Mountain View, CA 94043, University of Illinois, Urbana-Champaign, 1308 W Main St, Urbana, IL 61801 |
| Pseudocode | No | The paper references "Algorithm 1 of (Wang & Carreira-Perpi n an, 2013)" but does not contain structured pseudocode or algorithm blocks within its own text. |
| Open Source Code | No | The paper mentions that RAPPOR is an "open source Google technology" but does not state that the authors are releasing their own code for the methods described in this paper (k-RR, O-RR). |
| Open Datasets | No | The paper describes generating input data from various statistical distributions (e.g., "geometric distribution", "binomial distributions", "Zipf distribution", "multinomial distributions drawn from a symmetric Dirichlet distribution") for simulations, but does not refer to or provide access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. It focuses on simulating data from distributions. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment. |
| Experiment Setup | Yes | Free parameters are set via grid search over k [2, 4, 8, . . . , 2048, 4096], c [1, 2, 4, . . . , 512, 1024], h [1, 2, 4, 8, 16] for each ε. |