Differentially Private Learning of Structured Discrete Distributions

Authors: Ilias Diakonikolas, Moritz Hardt, Ludwig Schmidt

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We complement our theoretical guarantees with an experimental evaluation. Our experiments illustrate the speed and accuracy of our private estimators on both synthetic mixture models and a large public data set. 5 Experiments In addition to our theoretical results from the previous sections, we also investigate the empirical performance of our private distribution learning algorithm based on the maximum error rule.
Researcher Affiliation Collaboration Ilias Diakonikolas University of Edinburgh Moritz Hardt Google Research Ludwig Schmidt MIT
Pseudocode Yes See Figure 2 for a pseudocode of our algorithm. Figure 2: Maximum Error Rule (MERR). MAXIMUMERRORRULE(S [N]n, privacy parameters ϵ, δ) For t = 1 to T : 1. I = FINDBADINTERVAL(At 1, S) 2. At = UPDATE(At 1, S, I) FINDBADINTERVAL 1. Let I be the collection of all dyadic intervals of the domain. 2. For each J I, let q(J; S) = |weight(J, At 1) weight(J, S)|. 3. Output an I I sampled from the choosing mechanism with score function q over the collection I with privacy parameters (ϵ/2T, δ/T). UPDATE 1. Let I = (l, r) be the input interval. Compute wl = weight([1, l], S) + Laplace(0, 1/(2nϵ)) and wr = weight([l + 1, r], S) + Laplace(0, 1/(2nϵ)). 2. Output the CDF obtained from At 1 by adding the points (l, wl) and (r, wl + wr) to the graph of At 1.
Open Source Code No The paper states, "We implemented our algorithm in the Julia programming language (v0.3)" but does not provide any link or explicit statement about making the code open source or publicly available.
Open Datasets Yes We use the lepton p T (transverse momentum) feature of the Higgs data set (see Figure 2e of [29]). [29] Pierre Baldi, Peter Sadowski, and Daniel Whiteson. Searching for exotic particles in high-energy physics with deep learning. Nature Communications, (5), 2014.
Dataset Splits No The paper mentions using synthetic data and the Higgs dataset but does not provide specific details on how the data was split for training, validation, or testing (e.g., percentages, exact counts, or specific split methodologies). For the Higgs data, it states, "we interpret them as discrete values in [N] for N = 10^18. We then generate a sample from this data set by taking the first n samples and pass this subset as input to our private distribution learning algorithm." This does not constitute a standard train/validation split for reproducibility.
Hardware Specification Yes We implemented our algorithm in the Julia programming language (v0.3) and ran the experiments on an Intel Core i5-4690K CPU (3.5 3.9 GHz, 6 MB cache).
Software Dependencies Yes We implemented our algorithm in the Julia programming language (v0.3) and ran the experiments on an Intel Core i5-4690K CPU (3.5 3.9 GHz, 6 MB cache).
Experiment Setup Yes In all experiments involving our private learning algorithm, we set the privacy parameters to ϵ = 1 and δ = 1/n. The number of steps taken by the maximum error rule influences the learning error. In particular, a smaller number of steps leads to a better approximation for small values of n, while more samples allow us to achieve a better error with a larger number of steps. Even for the largest sample size n = 107 and the largest number of MERR steps, our algorithm runs in less than 5 seconds.