Differentially Private Distributed Data Summarization under Covariate Shift
Authors: Kanthi Sarpatwar, Karthikeyan Shanmugam, Venkata Sitaramagiridharganesh Ganapavarapu, Ashish Jagmohan, Roman Vaculin
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Apart from theoretical guarantees, we demonstrate the efficacy of our protocol using real-world datasets. We show through empirical evaluation that this is indeed the case; by examining generalization error on two example tasks, we show that the protocol pays only a small price for its differential privacy guarantees. 4 Experimental Evaluation Experiments on Real World Datasets: We now back our theoretical results with empirical experiments. We compare three algorithms: a) Non-Private Greedy, where the aggregator broadcasts the (exact) average of the hashed summary set (i.e., W (Ds,i)q ) and hashed validation set (i.e., W (Dv,i)m ). This is equivalent to the approach of Kim et al. (2016). b) Private Greedy, which is the Algorithm 3 with parameters set as above. c) Uniform Sampling, where we draw equal number pK of required samples from each data provider to construct a summary of size p. We empirically show that private greedy closely matches the performance of non-private greedy even under the strong differential privacy constraints. For comparison, we show that our algorithm outperforms uniform sampling. The motivation for choosing the latter as a candidate comes from the typical manner of using stochastic gradient descent approaches such as Federated Learning [Mc Mahan et al. (2016)] that perform uniform sampling. We experiment with two real world datasets. |
| Researcher Affiliation | Industry | Kanthi K. Sarpatwar IBM Research sarpatwa@us.ibm.com, Karthikeyan Shanmugam IBM Research AI karthikeyan.shanmugam2@ibm.com, Venkata Sitaramagiridharganesh Ganapavarapu IBM Research giridhar.ganapavarapu@ibm.com, Ashish Jagmohan IBM Research ashishja@us.ibm.com, Roman Vaculin IBM Research vaculin@us.ibm.com |
| Pseudocode | Yes | Algorithm 1: Computing the hash function h1( )., Algorithm 2: Computing the hash function h2( )., Algorithm 3: Description of the protocol. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We experiment with two real world datasets. We discuss one of them, which is based on an Allstate insurance dataset from a Kaggle (2014) competition. We show similar results for the MNIST dataset, that contains image data for recognizing hand written digits, in the Appendix G. |
| Dataset Splits | Yes | Validation data: From the remaining 30% of Connecticut data we choose 25% of data as the validation data set. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory specifications, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions software like Linear SVM, Xgboost, Decision Tree, and SVMs but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We set the parameters of our algorithm as follows: the RBF kernel parameter γ = 0.1, dimension of Rahimi Recht hash function h1(.) as d = 140. We use two different T parameters for different epochs given by Tinit(= T, = 1) = d1.5 = 1656 and Tsubs(= T, > 1) = 5. "v = 0.01 is the parameter for h2( ) for the validation set and " ,T is set for h2( ) on summaries Ds over epochs as 0.05 for = 1, 0.01 pp Tsubs for > 1. |