Differentially Private Markov Chain Monte Carlo

Authors: Mikko Heikkilä, Joonas Jälkö, Onur Dikmen, Antti Honkela

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In order to demonstrate our proposed method in practice, we use a simple 2-dimensional Gaussian mixture model... We use b = 1000 for the minibatches, and adjust the temperature of the chain s.t. N0 = 100 in (23)... As shown in Figure 2, the samples from the tempered chain with DP are nearly indistinguishable from the samples drawn from the non-private tempered chain. We also compared our method against DP stochastic gradient Langevin dynamics (DP SGLD) method of Li et al. [2019]. Figure 3 illustrates how the accuracy is affected by privacy.
Researcher Affiliation Academia Mikko A. Heikkilä Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics University of Helsinki, Helsinki, Finland mikko.a.heikkila@helsinki.fi; Joonas Jälkö Helsinki Institute for Information Technology HIIT, Department of Computer Science Aalto University, Espoo, Finland joonas.jalko@aalto.fi; Onur Dikmen Center for Applied Intelligent Systems Research (CAISR) Halmstad University, Halmstad, Sweden onur.dikmen@hh.se; Antti Honkela Helsinki Institute for Information Technology HIIT, Department of Computer Science University of Helsinki, Helsinki, Finland antti.honkela@helsinki.fi
Pseudocode No The paper describes algorithms in text and mathematical formulas but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes The code for running all the experiments is avalaible in https://github.com/DPBayes/ DP-MCMC-Neur IPS2019.
Open Datasets Yes In order to demonstrate our proposed method in practice, we use a simple 2-dimensional Gaussian mixture model2, that has been used by Welling and Teh [2011] and Seita et al. [2017] in the non-private setting: θj N(0, σ2 j , ), j = 1, 2 xi 0.5 N(θ1, σ2 x) + 0.5 N(θ1 + θ2, σ2 x), (25) where σ2 1 = 10, σ2 2 = 1, σ2 x = 2. For the observed data, we use fixed parameter values θ = (0, 1). Following Seita et al. [2017], we generate 10^6 samples from the model to use as training data.
Dataset Splits No The paper mentions 'training data' and burning-in iterations, but it does not specify explicit validation splits or cross-validation details.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory details) used for running experiments.
Software Dependencies No The paper mentions different methods and refers to a GitHub repository for code, but it does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes We use b = 1000 for the minibatches, and adjust the temperature of the chain s.t. N0 = 100 in (23). This corresponds to the temperature used by Seita et al. [2017] in their non-private test. To simulate this effect, we use the differentially private variational inference (DPVI) introduced by Jälkö et al. [2017] with a small privacy budget (0.22, 10-6) to find a rough estimate for the initial location. The DP MCMC method was burned in for 1 000 iterations and DP SGLD for 100 000 iterations.