Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Differentially Private Distributed Data Summarization under Covariate Shift
Authors: Kanthi Sarpatwar, Karthikeyan Shanmugam, Venkata Sitaramagiridharganesh Ganapavarapu, Ashish Jagmohan, Roman Vaculin
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Apart from theoretical guarantees, we demonstrate the efficacy of our protocol using real-world datasets. We show through empirical evaluation that this is indeed the case; by examining generalization error on two example tasks, we show that the protocol pays only a small price for its differential privacy guarantees. 4 Experimental Evaluation Experiments on Real World Datasets: We now back our theoretical results with empirical experiments. We compare three algorithms: a) Non-Private Greedy, where the aggregator broadcasts the (exact) average of the hashed summary set (i.e., W (Ds,i)q ) and hashed validation set (i.e., W (Dv,i)m ). This is equivalent to the approach of Kim et al. (2016). b) Private Greedy, which is the Algorithm 3 with parameters set as above. c) Uniform Sampling, where we draw equal number pK of required samples from each data provider to construct a summary of size p. We empirically show that private greedy closely matches the performance of non-private greedy even under the strong differential privacy constraints. For comparison, we show that our algorithm outperforms uniform sampling. The motivation for choosing the latter as a candidate comes from the typical manner of using stochastic gradient descent approaches such as Federated Learning [Mc Mahan et al. (2016)] that perform uniform sampling. We experiment with two real world datasets. |
| Researcher Affiliation | Industry | Kanthi K. Sarpatwar IBM Research EMAIL, Karthikeyan Shanmugam IBM Research AI EMAIL, Venkata Sitaramagiridharganesh Ganapavarapu IBM Research EMAIL, Ashish Jagmohan IBM Research EMAIL, Roman Vaculin IBM Research EMAIL |
| Pseudocode | Yes | Algorithm 1: Computing the hash function h1( )., Algorithm 2: Computing the hash function h2( )., Algorithm 3: Description of the protocol. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We experiment with two real world datasets. We discuss one of them, which is based on an Allstate insurance dataset from a Kaggle (2014) competition. We show similar results for the MNIST dataset, that contains image data for recognizing hand written digits, in the Appendix G. |
| Dataset Splits | Yes | Validation data: From the remaining 30% of Connecticut data we choose 25% of data as the validation data set. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory specifications, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions software like Linear SVM, Xgboost, Decision Tree, and SVMs but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We set the parameters of our algorithm as follows: the RBF kernel parameter γ = 0.1, dimension of Rahimi Recht hash function h1(.) as d = 140. We use two different T parameters for different epochs given by Tinit(= T, = 1) = d1.5 = 1656 and Tsubs(= T, > 1) = 5. "v = 0.01 is the parameter for h2( ) for the validation set and " ,T is set for h2( ) on summaries Ds over epochs as 0.05 for = 1, 0.01 pp Tsubs for > 1. |