reproducibilityindex.ai

Data Augmentation MCMC for Bayesian Inference from Privatized Data

Authors: Nianqiao Ju, Jordan Awan, Ruobin Gong, Vinayak Rao

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate the efﬁcacy and applicability of our methods on a naïve-Bayes log-linear model and on a linear regression model.
Researcher Affiliation	Academia	Nianqiao Phyllis Ju Department of Statistics Purdue University West Lafayette, IN 47907 nianqiao@purdue.edu Jordan A. Awan Department of Statistics Purdue University West Lafayette, IN 47907 jawan@purdue.edu Ruobin Gong Department of Statistics Rutgers University Piscataway, NJ 08854 ruobin.gong@rutgers.edu Vinayak A. Rao Department of Statistics Purdue University West Lafayette, IN 47907 varao@purdue.edu
Pseudocode	Yes	Algorithm 1 One iteration of the privacy-aware Metropolis-within-Gibbs sampler
Open Source Code	No	We will release our code to a public Git Hub repository prior to the conference.
Open Datasets	No	The paper describes its simulation setup, stating 'We generate one non-private dataset from the model, and hold it ﬁxed' and 'We generate one conﬁdential dataset (x, y) and hold it ﬁxed.' However, it does not provide concrete access information (link, DOI, formal citation) for these or any other public datasets used in the experiments.
Dataset Splits	No	The paper does not explicitly provide specific training/test/validation dataset splits, percentages, or sample counts. It mentions running chains for a certain number of iterations and discarding burn-in, which is typical for MCMC, but not a traditional data splitting methodology for model validation.
Hardware Specification	No	We use an internal cluster for our experiments. A single run with 10,000 iterations takes 2–3 hours for the log-linear model and 3–4 hours for the linear regression model.
Software Dependencies	No	The paper states 'We implement our algorithm in Python (version 3.9) using PyTorch (version 1.10.1)' in the supplementary material, which is external to the main paper. Within the main paper itself, there are no specific software dependencies mentioned with version numbers.
Experiment Setup	Yes	For the simulation, we set N = 100 (number of records), I = 5 (number of classes), K = 5 (number of features), and Jk = 3 for all k = 1, . . . , K (possible values for each feature). We evaluate our sampler for privacy levels corresponding to 2 {.1, .3, 1, 3, 10}. We discard the ﬁrst 5000 iterations as burn-in.