Privacy for Free: Posterior Sampling and Stochastic Gradient Monte Carlo
Authors: Yu-Xiang Wang, Stephen Fienberg, Alex Smola
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate how our proposed methods work in practice, we selected two binary classification datasets: Abalone and Adult, from the first page of UCI Machine Learning Repository and performed privacy constrained logistic regression on them. Specifically, we compared two of our proposed methods, OPS mechanism and hybrid algorithm against the state-of-the-art empirical risk minimization algorithm OBJPERT (Chaudhuri et al., 2011; Kifer et al., 2012) under varying level of differential privacy protection. The results are shown in Figure 1. |
| Researcher Affiliation | Collaboration | Yu-Xiang Wang YUXIANGW@CS.CMU.EDU Stephen E. Fienberg], FIENBERG@STAT.CMU.EDU Alexander J. Smola , ALEX@SMOLA.ORG Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA ]Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213, USA Marianas Labs Inc., Pittsburgh, PA 15213, USA |
| Pseudocode | Yes | Algorithm 1 One-Posterior Sample (OPS ) estimator, Algorithm 2 Differentially Private Stochastic Gradient Langevin Dynamics (DP-SGLD), Algorithm 3 Hybrid Posterior Sampling Algorithm |
| Open Source Code | No | The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper. |
| Open Datasets | Yes | To evaluate how our proposed methods work in practice, we selected two binary classification datasets: Abalone and Adult, from the first page of UCI Machine Learning Repository |
| Dataset Splits | No | The paper describes using datasets but does not explicitly provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment. |
| Experiment Setup | Yes | Minibatch size and number of data passes in the hybrid DP-SGNHT are chosen to be both . All optimization based methods are solved using BFGS algorithm to high numerical accuracy. |