Better Private Linear Regression Through Better Private Feature Selection

Authors: Travis Dick, Jennifer Gillenwater, Matthew Joseph

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove a utility guarantee for the setting where features are normally distributed and conduct experiments across 25 datasets. We find that adding this private feature selection step before regression significantly broadens the applicability of plug-and-play private linear regression algorithms at little additional cost to privacy, computation, or decision-making by the end user.
Researcher Affiliation Industry tdick@google.com, Google Research jgillenw@gmail.com, work done while at Google Research mtjoseph@google.com, Google Research
Pseudocode Yes Algorithm 1 DPKendall(D, k, ε) [...] Algorithm 2 Sub Lasso(D, k, m, ε)
Open Source Code Yes Experiment code may be found on Github [14].
Open Datasets Yes Evaluated over a collection of 25 linear regression datasets taken from Tang et al. [30]. [...] Descriptions of the relevant algorithms appear in Section 4.1 and Section 4.2. Section 4.3 discusses the results. Experiment code may be found on Github [14]. [...] Figure 3: Parameters of the 25 datasets used in our experiments.
Dataset Splits No For each algorithm and dataset, we run 10 trials using random 90-10 train-test splits and record the resulting test R2 values.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup Yes All experiments use (ln(3), 10 5)-DP. Where applicable, 5% of the privacy budget is spent on private feature selection, 5% on choosing the number of models, and the remainder is spent on private regression. Throughout, we use η = 10 4 as the failure probability for the lower bound used to choose the number of models.