Priv’IT: Private and Sample Efficient Identity Testing
Authors: Bryan Cai, Constantinos Daskalakis, Gautam Kamath
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally compare the sample complexity of our method to that of recently proposed methods for private hypothesis testing (Gaboardi et al., 2016; Kifer & Rogers, 2017). We performed an empirical evaluation of our algorithm, Priv IT, on synthetic datasets. All experiments were performed on a laptop computer with a 2.6 GHz Intel Core i7-6700HQ CPU and 8 GB of RAM. The results of our first test are provided in Figure 1. |
| Researcher Affiliation | Academia | Bryan Cai * 1 Constantinos Daskalakis * 1 Gautam Kamath * 1 1Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. Correspondence to: Bryan Cai <bcai@mit.edu>, Constantinos Daskalakis <costis@csail.mit.edu>, Gautam Kamath <g@csail.mit.edu>. |
| Pseudocode | Yes | Algorithm 1 Priv IT: A differentially private identity tester |
| Open Source Code | No | The paper does not provide an explicit statement or link for the source code of the methodology described in the paper. It mentions implementing a modified version of Priv IT but does not indicate where that code is available. |
| Open Datasets | No | The paper mentions using 'synthetic datasets' and 'Paninski construction' for its experiments, which are distributions generated by the authors, not publicly available datasets with concrete access information like a DOI, URL, or citation to a specific public repository. |
| Dataset Splits | No | The paper does not specify traditional training, validation, or test dataset splits. Instead, it describes generating samples from distributions ('Paninski construction') to evaluate statistical properties like type I and type II errors, which is a different experimental setup from typical data splits. |
| Hardware Specification | Yes | All experiments were performed on a laptop computer with a 2.6 GHz Intel Core i7-6700HQ CPU and 8 GB of RAM. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9). It discusses algorithms and methods but not the specific software environment for replication. |
| Experiment Setup | Yes | Our experimental setup was as follows. The algorithms were provided q as the uniform distribution over [n]. The algorithms were also provided with samples from some distribution p. This (unknown) p was q for the case p = q, or a distribution which we call the Paninski construction for the case d TV(p, q) α. We fixed parameters ε = 0.1 and α = 0.1. For each n, we ran the testing algorithms with increasing sample sizes m in order to discover the minimum sample size when the type I and type II errors were both empirically below 1/3. To determine these empirical error rates, we ran all algorithms 1000 times for each n and m, and recorded the fraction of the time each algorithm was correct. As the other algorithms take a parameter βI as a target type I error, we input 1/3 as this parameter. All other parameters are the same as in the previous test. again with ε = 0.1, we ran z CDP-GOF with the guarantee of ε2/2-z CDP and Priv IT with the guarantee of ( ε2/2 log(1/δ), δ) for various δ > 0. We note that δ is often thought in theory to be cryptographically small (such as 2 100), but we compare with a wide range of δ, both large and small: δ = 1/et for t {1, 2, 4, 8, 16}. |