Priv’IT: Private and Sample Efficient Identity Testing

Authors: Bryan Cai, Constantinos Daskalakis, Gautam Kamath

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally compare the sample complexity of our method to that of recently proposed methods for private hypothesis testing (Gaboardi et al., 2016; Kifer & Rogers, 2017). We performed an empirical evaluation of our algorithm, Priv IT, on synthetic datasets. All experiments were performed on a laptop computer with a 2.6 GHz Intel Core i7-6700HQ CPU and 8 GB of RAM. The results of our first test are provided in Figure 1.
Researcher Affiliation Academia Bryan Cai * 1 Constantinos Daskalakis * 1 Gautam Kamath * 1 1Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. Correspondence to: Bryan Cai <bcai@mit.edu>, Constantinos Daskalakis <costis@csail.mit.edu>, Gautam Kamath <g@csail.mit.edu>.
Pseudocode Yes Algorithm 1 Priv IT: A differentially private identity tester
Open Source Code No The paper does not provide an explicit statement or link for the source code of the methodology described in the paper. It mentions implementing a modified version of Priv IT but does not indicate where that code is available.
Open Datasets No The paper mentions using 'synthetic datasets' and 'Paninski construction' for its experiments, which are distributions generated by the authors, not publicly available datasets with concrete access information like a DOI, URL, or citation to a specific public repository.
Dataset Splits No The paper does not specify traditional training, validation, or test dataset splits. Instead, it describes generating samples from distributions ('Paninski construction') to evaluate statistical properties like type I and type II errors, which is a different experimental setup from typical data splits.
Hardware Specification Yes All experiments were performed on a laptop computer with a 2.6 GHz Intel Core i7-6700HQ CPU and 8 GB of RAM.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9). It discusses algorithms and methods but not the specific software environment for replication.
Experiment Setup Yes Our experimental setup was as follows. The algorithms were provided q as the uniform distribution over [n]. The algorithms were also provided with samples from some distribution p. This (unknown) p was q for the case p = q, or a distribution which we call the Paninski construction for the case d TV(p, q) α. We fixed parameters ε = 0.1 and α = 0.1. For each n, we ran the testing algorithms with increasing sample sizes m in order to discover the minimum sample size when the type I and type II errors were both empirically below 1/3. To determine these empirical error rates, we ran all algorithms 1000 times for each n and m, and recorded the fraction of the time each algorithm was correct. As the other algorithms take a parameter βI as a target type I error, we input 1/3 as this parameter. All other parameters are the same as in the previous test. again with ε = 0.1, we ran z CDP-GOF with the guarantee of ε2/2-z CDP and Priv IT with the guarantee of ( ε2/2 log(1/δ), δ) for various δ > 0. We note that δ is often thought in theory to be cryptographically small (such as 2 100), but we compare with a wide range of δ, both large and small: δ = 1/et for t {1, 2, 4, 8, 16}.