Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing

Authors: Marco Gaboardi, Hyun Lim, Ryan Rogers, Salil Vadhan

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now show how each of our tests perform on simulated data when H0 holds in goodness of fit and independence testing. We fix our desired significance 1 α = 0.95 and privacy level (ϵ, δ) = (0.1, 10 6) in all of our tests.By Theorem 5.3, we know that MCGOFD will have significance at least 1 α. We then turn to our test Priv GOF to compute the proportion of trials that failed to reject H0 : p = p0 when it holds. In Figure 1 we give several different null hypotheses p0 and sample sizes n to show that Priv GOF achieves near 0.95 significance in all our tested cases. We also compare our results with how the original test GOF would perform if used on the private counts with either Laplace and Gaussian noise.
Researcher Affiliation Academia Marco Gaboardi GABOARDI@BUFFALO.EDU University at Buffalo, SUNY Hyun Woo Lim LIMHYUN@G.UCLA.EDU University of California, Los Angeles Ryan Rogers RYROGERS@SAS.UPENN.EDU University of Pennsylvania Salil P. Vadhan SALIL@SEAS.HARVARD.EDU Harvard University
Pseudocode Yes Algorithm 1 Goodness of Fit Test for Multinomial Data; Algorithm 2 MC Goodness of Fit; Algorithm 3 Private Chi-Squared Goodness of Fit Test; Algorithm 4 Pearson Chi-Squared Independence Test; Algorithm 5 Two Step MLE Calculation; Algorithm 6 MC Independence Testing; Algorithm 7 Private Independence Test for r c tables
Open Source Code No The paper does not contain any statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper states, 'We now show how each of our tests perform on simulated data when H0 holds in goodness of fit and independence testing.' This indicates the use of simulated data, not a publicly available dataset.
Dataset Splits No The paper describes experiments based on 'simulated data' and 'trials' (e.g., '10,000 trials', '1,000 trials') under specified null and alternative hypotheses. It does not mention traditional training, validation, or test dataset splits, but rather conducts simulations to evaluate statistical properties like significance and power.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to conduct the experiments.
Software Dependencies No The paper mentions using 'the package in R Comp Quad Form' and the 'imhof method (Imhof, 1961)' for critical values, but it does not specify version numbers for R or the package, which is required for a 'Yes' answer.
Experiment Setup Yes We fix our desired significance 1 α = 0.95 and privacy level (ϵ, δ) = (0.1, 10 6) in all of our tests. ... We set the number of samples k = 50 in MCIndep D regardless of the noise we added and when we use Laplace noise, we set γ = 0.01 as the parameter in 2MLE. ... For our two goodness of fit tests, MCGOFD (with k = 100)... fixing parameters α = 0.05 and (ϵ, δ) = (0.1, 10 6).