reproducibilityindex.ai

Better Private Linear Regression Through Better Private Feature Selection

Authors: Travis Dick, Jennifer Gillenwater, Matthew Joseph

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We prove a utility guarantee for the setting where features are normally distributed and conduct experiments across 25 datasets. We find that adding this private feature selection step before regression significantly broadens the applicability of plug-and-play private linear regression algorithms at little additional cost to privacy, computation, or decision-making by the end user.
Researcher Affiliation	Industry	tdick@google.com, Google Research jgillenw@gmail.com, work done while at Google Research mtjoseph@google.com, Google Research
Pseudocode	Yes	Algorithm 1 DPKendall(D, k, ε) [...] Algorithm 2 Sub Lasso(D, k, m, ε)
Open Source Code	Yes	Experiment code may be found on Github [14].
Open Datasets	Yes	Evaluated over a collection of 25 linear regression datasets taken from Tang et al. [30]. [...] Descriptions of the relevant algorithms appear in Section 4.1 and Section 4.2. Section 4.3 discusses the results. Experiment code may be found on Github [14]. [...] Figure 3: Parameters of the 25 datasets used in our experiments.
Dataset Splits	No	For each algorithm and dataset, we run 10 trials using random 90-10 train-test splits and record the resulting test R2 values.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers.
Experiment Setup	Yes	All experiments use (ln(3), 10 5)-DP. Where applicable, 5% of the privacy budget is spent on private feature selection, 5% on choosing the number of models, and the remainder is spent on private regression. Throughout, we use η = 10 4 as the failure probability for the lower bound used to choose the number of models.