Parameters or Privacy: A Provable Tradeoff Between Overparameterization and Membership Inference
Authors: Jasper Tan, Blake Mason, Hamid Javadi, Richard Baraniuk
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In a theoretical direction, we prove for linear regression on Gaussian data in the overparameterized regime that increasing the number of parameters of the model increases its vulnerability to MI (Theorem 3.2). In a supporting empirical direction, we demonstrate that the same behavior holds for a range of more complex models: a latent space model, a time-series model, and a nonlinear random Re LU features model (Section 5). |
| Researcher Affiliation | Academia | Jasper Tan Rice University Blake Mason Rice University Hamid Javadi Rice University Richard G. Baraniuk Rice University |
| Pseudocode | No | The paper describes methods using mathematical formulations and prose, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is provided in https://github.com/tanjasper/parameters_or_privacy. |
| Open Datasets | No | The paper describes data generation processes (e.g., 'data points (xi, yi) where xi N(0, Id) and yi = x>i β + i, where i N(0, σ2)') and defines specific data models (Latent Space Model, Time-Series Data, Random ReLU Features) for its experiments, rather than using or providing access information for existing publicly available datasets. |
| Dataset Splits | No | The paper describes how training data is sampled ('Sampling n data points, we denote by X the n D matrix whose ith row is x>i and by y the n-dimensional vector of elements yi') and how samples are treated for membership inference (m=0 vs m=1 cases), but it does not specify a separate validation dataset or its split percentages/counts for model tuning or evaluation. |
| Hardware Specification | Yes | Our experiments are implemented in Python 3.9 using PyTorch 1.10.1 and run on an Nvidia Tesla V100 GPU. |
| Software Dependencies | Yes | Our experiments are implemented in Python 3.9 using PyTorch 1.10.1 and run on an Nvidia Tesla V100 GPU. |
| Experiment Setup | Yes | In this experiment, we set n = 200, d = 20, and vary p. For each experiment, we sample a single x0 N(0, Id) and a single set of wj vectors, each from N(0, Id), and keep them fixed. We leave the other variables random with the following distributions: zi N(0, Id), i N(0, σ2), β N(0, 1 d Id) and ui,j N(0, 1). |