Better Private Linear Regression Through Better Private Feature Selection
Authors: Travis Dick, Jennifer Gillenwater, Matthew Joseph
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove a utility guarantee for the setting where features are normally distributed and conduct experiments across 25 datasets. We find that adding this private feature selection step before regression significantly broadens the applicability of plug-and-play private linear regression algorithms at little additional cost to privacy, computation, or decision-making by the end user. |
| Researcher Affiliation | Industry | tdick@google.com, Google Research jgillenw@gmail.com, work done while at Google Research mtjoseph@google.com, Google Research |
| Pseudocode | Yes | Algorithm 1 DPKendall(D, k, ε) [...] Algorithm 2 Sub Lasso(D, k, m, ε) |
| Open Source Code | Yes | Experiment code may be found on Github [14]. |
| Open Datasets | Yes | Evaluated over a collection of 25 linear regression datasets taken from Tang et al. [30]. [...] Descriptions of the relevant algorithms appear in Section 4.1 and Section 4.2. Section 4.3 discusses the results. Experiment code may be found on Github [14]. [...] Figure 3: Parameters of the 25 datasets used in our experiments. |
| Dataset Splits | No | For each algorithm and dataset, we run 10 trials using random 90-10 train-test splits and record the resulting test R2 values. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | All experiments use (ln(3), 10 5)-DP. Where applicable, 5% of the privacy budget is spent on private feature selection, 5% on choosing the number of models, and the remainder is spent on private regression. Throughout, we use η = 10 4 as the failure probability for the lower bound used to choose the number of models. |