reproducibilityindex.ai

Hypothesis Testing for Differentially Private Linear Regression

Authors: Daniel Alabi, Salil Vadhan

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through a suite of Monte Carlo based experiments, we show that our tests achieve desired signiﬁcance levels and have a high power that approaches the power of the non-private tests as we increase sample sizes or the privacy-loss parameter.
Researcher Affiliation	Academia	1Department of Computer Science and Data Science Institute, Columbia University 2Harvard School of Engineering and Applied Sciences
Pseudocode	Yes	We provide Algorithm 1, a generic framework for DP Monte Carlo tests via a parametric bootstrap routine for estimating sufﬁcient statistics. Algorithm 1 crucially relies on DPStats, a procedure that uses statistics of the independent and dependent variables to produce DP statistics.
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets	Yes	(3) Bike Sharing Dataset: We use a real-world dataset publicly available in the UCI machine learning repository. The dataset consists of daily and hourly counts (with other information such as seasonal and weather information) of bike rentals in the Capital bikeshare system in years 2011 and 2012.
Dataset Splits	Yes	Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Section E. We specify data splits for the mixture model tests and hyperparameters for both linear and mixture model testers.
Hardware Specification	Yes	Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Section E.
Software Dependencies	No	The paper does not explicitly state specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9').
Experiment Setup	Yes	General Parameter Setup for Synthetic Data: For experimental evaluation on synthetic datasets, we generated datasets with sizes between n = 100 and n = 10,000. For both the linear relationship and mixture model tests on synthetic data below, we consider the following values of ρ: {0.12/2, 12/2, 52/2, 102/2}. We draw the independent variables x1, . . . , xn according to a few different distributions: Normal, Uniform, Exponential. For all tests below, the clipping parameter is either set to = 2 or = 3.