reproducibilityindex.ai

Differentially Private Bayesian Inference for Generalized Linear Models

Authors: Tejas Kulkarni, Joonas Jälkö, Antti Koskela, Samuel Kaski, Antti Honkela

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this Section we present experiments on logistic regression. Additional experiments on Poisson regression are included in the Supplement.
Researcher Affiliation	Academia	1Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Finland 2Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Finland 3Department of Computer Science, University of Manchester, United Kingdom.
Pseudocode	No	The paper refers to 'Algorithm 1 of Balle and Wang (2018)' but does not provide its own pseudocode or algorithm block.
Open Source Code	No	The paper does not provide any explicit statement or link regarding the public availability of its source code.
Open Datasets	Yes	Datasets. We use Adult (Blake and Merz, 1998) and Diabetes (Kahn, 1994) datasets from UCI repository as these are standard and easy to explain.
Dataset Splits	No	The paper mentions training on 'randomly sampled 8000/40,000 records' for the Adult dataset and 'N = 758' for Diabetes, but it does not specify explicit training, validation, and test dataset splits with percentages or sample counts.
Hardware Specification	No	The paper mentions 'computational resources' but does not provide specific hardware details such as GPU or CPU models, or memory specifications used for experiments.
Software Dependencies	No	The paper states models are 'speciﬁed in Stan (Carpenter et al., 2017) using its Python interface' but does not provide specific version numbers for Stan or Python, or other software dependencies.
Experiment Setup	Yes	Throughout our experiments, we use the ﬁrst two central moments of joint (X, y) as the summary statistics. We run 4 Markov chains in parallel and discard the ﬁrst 50% as warm-up samples. We run DP-SGLD with batch-size N (as suggested by Abadi et al., 2016) for 10,000 iterations and discard the ﬁrst 6000 samples as burn-in. The batch size and the learning rate chosen for DP-SGLD were 28 and 10 1. For the data covariance matrix Σ we gave a scaled LKJ (...) prior. We scale a positive deﬁnite correlation matrix from the LKJ correlation distribution of shape η = 2 from both sides with a diagonal matrix with N(0, 2.5) distributed diagonal entries. (...) we gave the regression coefﬁcients orientation a uniform prior, and the squared norm a truncated Chi-square prior. We treat the upper-bound for the truncation as a hyper-parameter, which was set to 2 or 3 times the square of non-private θ s norm.