reproducibilityindex.ai

Locally Private Bayesian Inference for Count Models

Authors: Aaron Schein, Zhiwei Steven Wu, Alexandra Schofield, Mingyuan Zhou, Hanna Wallach

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our method s utility using two case studies that involve real-world email data. We show that our method consistently outperforms the commonly used na ıve approach, wherein inference proceeds as usual, treating the locally privatized data as if it were not privatized.
Researcher Affiliation	Collaboration	Aaron Schein 1 Zhiwei Steven Wu 2 Alexandra Schoﬁeld 3 Mingyuan Zhou 4 Hanna Wallach 5 1University of Massachusetts Amherst 2University of Minnesota 3Cornell University 4University of Texas at Austin 5Microsoft. Correspondence to: Aaron Schein <aschein@cs.umass.edu>.
Pseudocode	No	The paper describes an MCMC algorithm using equations and textual explanations, but it does not include a formally structured pseudocode block or algorithm box.
Open Source Code	No	No explicit statement about providing open-source code for the described methodology or a link to a code repository was found.
Open Datasets	Yes	For our experiments using real-world data, we derived count matrices from the Enron email corpus (Klimt & Yang, 2004).
Dataset Splits	No	The paper mentions holding out elements for link prediction and using a non-privatized dataset for evaluation, but it does not provide specific train/validation/test dataset splits (e.g., percentages or counts) or reference standard splits.
Hardware Specification	No	No specific hardware details (e.g., CPU or GPU models, memory, or cloud instance types) used for running experiments were provided.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their respective versions) were mentioned as being used for the experiments.
Experiment Setup	Yes	For each method and data set, we ran 6,000 MCMC iterations, saving every 25th sample after the ﬁrst 1,000. ... We used three privacy levels ϵ/N {3, 2, 1}. ... For each method and data set, we used K = 50 topics and ran 7,500 iterations of MCMC, saving every 100th sample of the latent variables after the ﬁrst 2,500. ... We generated social networks of V = 20 actors with C = 5 communities. We randomly generated the true parameters θ ic, π cd Γ(a0, b0) with a0 = 0.01 and b0 = 0.5 to encourage sparsity; doing so exaggerates the block structure in the network. ... For each method and data set, we ran 8,500 MCMC iterations, saving every 25th sample after the ﬁrst 1,000 and using these samples to compute ˆ µij. ... We used C {5, 10, 20} and we used the saved samples to compute ˆµij.