Differentially Private Testing of Identity and Closeness of Discrete Distributions

Authors: Jayadev Acharya, Ziteng Sun, Huanyu Zhang

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We derive upper and lower bounds on the sample complexity of both the problems under (Á, )-differential privacy. We provide sample optimal algorithms for identity testing problem for all parameter ranges, and the first results for closeness testing. Our closeness testing bounds are optimal in the sparse regime where the number of samples is at most k. Our upper bounds are obtained by privatizing non-private estimators for these problems. The non-private estimators are chosen to have small sensitivity. We propose a general framework to establish lower bounds on the sample complexity of statistical tasks under differential privacy. We show a bound on differentially private algorithms in terms of a coupling between the two hypothesis classes we aim to test. By carefully constructing chosen priors over the hypothesis classes, and using Le Cam s two point theorem we provide a general mechanism for proving lower bounds. We believe that the framework can be used to obtain strong lower bounds for other statistical tasks under privacy.
Researcher Affiliation Academia Jayadev Acharya ú Cornell University acharya@cornell.edu Ziteng Sun ú Cornell University zs335@cornell.edu Huanyu Zhang ú Cornell University hz388@cornell.edu
Pseudocode Yes Algorithm 1 Uniformity testing and Algorithm 2.
Open Source Code No The paper is theoretical and focuses on algorithm design, analysis, and proving bounds. It does not mention providing open-source code for its described methodologies.
Open Datasets No The paper is theoretical and does not conduct experiments involving datasets for training. It discusses sampling from distributions conceptually but does not refer to specific publicly available datasets.
Dataset Splits No The paper is theoretical and does not conduct empirical experiments that would involve training, validation, or test dataset splits.
Hardware Specification No The paper is theoretical and does not report on running experiments that would require specific hardware. No hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and focuses on mathematical proofs and algorithm design. It does not mention any specific software dependencies with version numbers for implementation or analysis.
Experiment Setup No The paper is theoretical and does not describe empirical experimental setups, including hyperparameters, training configurations, or system-level settings.