Locally Private Bayesian Inference for Count Models
Authors: Aaron Schein, Zhiwei Steven Wu, Alexandra Schofield, Mingyuan Zhou, Hanna Wallach
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate our method s utility using two case studies that involve real-world email data. We show that our method consistently outperforms the commonly used na ıve approach, wherein inference proceeds as usual, treating the locally privatized data as if it were not privatized. |
| Researcher Affiliation | Collaboration | Aaron Schein 1 Zhiwei Steven Wu 2 Alexandra Schofield 3 Mingyuan Zhou 4 Hanna Wallach 5 1University of Massachusetts Amherst 2University of Minnesota 3Cornell University 4University of Texas at Austin 5Microsoft. Correspondence to: Aaron Schein <aschein@cs.umass.edu>. |
| Pseudocode | No | The paper describes an MCMC algorithm using equations and textual explanations, but it does not include a formally structured pseudocode block or algorithm box. |
| Open Source Code | No | No explicit statement about providing open-source code for the described methodology or a link to a code repository was found. |
| Open Datasets | Yes | For our experiments using real-world data, we derived count matrices from the Enron email corpus (Klimt & Yang, 2004). |
| Dataset Splits | No | The paper mentions holding out elements for link prediction and using a non-privatized dataset for evaluation, but it does not provide specific train/validation/test dataset splits (e.g., percentages or counts) or reference standard splits. |
| Hardware Specification | No | No specific hardware details (e.g., CPU or GPU models, memory, or cloud instance types) used for running experiments were provided. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their respective versions) were mentioned as being used for the experiments. |
| Experiment Setup | Yes | For each method and data set, we ran 6,000 MCMC iterations, saving every 25th sample after the first 1,000. ... We used three privacy levels ϵ/N {3, 2, 1}. ... For each method and data set, we used K = 50 topics and ran 7,500 iterations of MCMC, saving every 100th sample of the latent variables after the first 2,500. ... We generated social networks of V = 20 actors with C = 5 communities. We randomly generated the true parameters θ ic, π cd Γ(a0, b0) with a0 = 0.01 and b0 = 0.5 to encourage sparsity; doing so exaggerates the block structure in the network. ... For each method and data set, we ran 8,500 MCMC iterations, saving every 25th sample after the first 1,000 and using these samples to compute ˆ µij. ... We used C {5, 10, 20} and we used the saved samples to compute ˆµij. |