Hypothesis Testing for Differentially Private Linear Regression

Authors: Daniel Alabi, Salil Vadhan

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through a suite of Monte Carlo based experiments, we show that our tests achieve desired significance levels and have a high power that approaches the power of the non-private tests as we increase sample sizes or the privacy-loss parameter.
Researcher Affiliation Academia 1Department of Computer Science and Data Science Institute, Columbia University 2Harvard School of Engineering and Applied Sciences
Pseudocode Yes We provide Algorithm 1, a generic framework for DP Monte Carlo tests via a parametric bootstrap routine for estimating sufficient statistics. Algorithm 1 crucially relies on DPStats, a procedure that uses statistics of the independent and dependent variables to produce DP statistics.
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets Yes (3) Bike Sharing Dataset: We use a real-world dataset publicly available in the UCI machine learning repository. The dataset consists of daily and hourly counts (with other information such as seasonal and weather information) of bike rentals in the Capital bikeshare system in years 2011 and 2012.
Dataset Splits Yes Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Section E. We specify data splits for the mixture model tests and hyperparameters for both linear and mixture model testers.
Hardware Specification Yes Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Section E.
Software Dependencies No The paper does not explicitly state specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9').
Experiment Setup Yes General Parameter Setup for Synthetic Data: For experimental evaluation on synthetic datasets, we generated datasets with sizes between n = 100 and n = 10,000. For both the linear relationship and mixture model tests on synthetic data below, we consider the following values of ρ: {0.12/2, 12/2, 52/2, 102/2}. We draw the independent variables x1, . . . , xn according to a few different distributions: Normal, Uniform, Exponential. For all tests below, the clipping parameter is either set to = 2 or = 3.