Parameters or Privacy: A Provable Tradeoff Between Overparameterization and Membership Inference

Authors: Jasper Tan, Blake Mason, Hamid Javadi, Richard Baraniuk

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In a theoretical direction, we prove for linear regression on Gaussian data in the overparameterized regime that increasing the number of parameters of the model increases its vulnerability to MI (Theorem 3.2). In a supporting empirical direction, we demonstrate that the same behavior holds for a range of more complex models: a latent space model, a time-series model, and a nonlinear random Re LU features model (Section 5).
Researcher Affiliation Academia Jasper Tan Rice University Blake Mason Rice University Hamid Javadi Rice University Richard G. Baraniuk Rice University
Pseudocode No The paper describes methods using mathematical formulations and prose, but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is provided in https://github.com/tanjasper/parameters_or_privacy.
Open Datasets No The paper describes data generation processes (e.g., 'data points (xi, yi) where xi N(0, Id) and yi = x>i β + i, where i N(0, σ2)') and defines specific data models (Latent Space Model, Time-Series Data, Random ReLU Features) for its experiments, rather than using or providing access information for existing publicly available datasets.
Dataset Splits No The paper describes how training data is sampled ('Sampling n data points, we denote by X the n D matrix whose ith row is x>i and by y the n-dimensional vector of elements yi') and how samples are treated for membership inference (m=0 vs m=1 cases), but it does not specify a separate validation dataset or its split percentages/counts for model tuning or evaluation.
Hardware Specification Yes Our experiments are implemented in Python 3.9 using PyTorch 1.10.1 and run on an Nvidia Tesla V100 GPU.
Software Dependencies Yes Our experiments are implemented in Python 3.9 using PyTorch 1.10.1 and run on an Nvidia Tesla V100 GPU.
Experiment Setup Yes In this experiment, we set n = 200, d = 20, and vary p. For each experiment, we sample a single x0 N(0, Id) and a single set of wj vectors, each from N(0, Id), and keep them fixed. We leave the other variables random with the following distributions: zi N(0, Id), i N(0, σ2), β N(0, 1 d Id) and ui,j N(0, 1).