Differentially Private Generalized Linear Models Revisited

Authors: Raman Arora, Raef Bassily, Cristóbal Guzmán, Michael Menart, Enayat Ullah

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the problem of (ǫ, δ)-differentially private learning of linear predictors with convex losses. We provide results for two subclasses of loss functions. The first case is when the loss is smooth and non-negative but not necessarily Lipschitz (such as the squared loss). For this case, we establish an upper bound on the excess population risk of e O w n + min n w 2 nǫ o , where n is the number of samples, d is the dimension of the problem, and w is the minimizer of the population risk. Apart from the dependence on w , our bound is essentially tight in all parameters. In particular, we show a lower bound of eΩ 1 n + min n w 4/3 nǫ o . We also revisit the previously studied case of Lipschitz losses [SSTT20]. For this case, we close the gap in the existing work and show that the optimal rate is (up to log factors) Θ w n + min n w nǫ , nǫ o , where rank is the rank of the design matrix. This improves over existing work in the high privacy regime. Finally, our algorithms involve a private model selection approach that we develop to enable attaining the stated rates without a-priori knowledge of w .Our work is largely theoretical. Moreover, there are no foreseeable negative societal impacts for our proposed algorithms.
Researcher Affiliation Academia Raman Arora Department of Computer Science The Johns Hopkins University arora@cs.jhu.edu Raef Bassily Department of Computer Science & Engineering Translational Data Analytics Institute (TDAI) The Ohio State University bassily.1@osu.edu Cristóbal Guzmán Inst. for Mathematical and Comput. Eng. Fac. de Matemáticas and Esc. de Ingeniería Pontificia Universidad Católica de Chile c.guzman@utwente.nl Michael Menart Department of Computer Science & Engineering The Ohio State University menart.2@osu.edu Enayat Ullah Department of Computer Science The Johns Hopkins University enayat@jhu.edu
Pseudocode Yes Algorithm 1 Noisy GD Input: Dataset S, loss function ℓ, constraint set W, steps T, learning rate η, noise scale σ2 1: w0 = 0 2: for t = 1 to T 1 do 3: ξ N(0, σ2I) 4: wt+1 = ΠW wt η b L(wt; S) + ξ where ΠW is the Euclidean projection on to W 5: end for Output: bw = 1 T PT t=1 wj ... Algorithm 2 JL Method ... Algorithm 3 Regularized Output Perturbation ... Algorithm 4 Private Grid Search
Open Source Code No The paper does not provide explicit access to source code for the methodology. The ethics statement indicates N/A for code.
Open Datasets No The paper describes theoretical concepts like 'training set S1 (X Y)n/2' but does not use or provide access to any actual dataset for empirical training. The ethics statement indicates N/A for data.
Dataset Splits No The paper mentions 'validation set S2' as part of Algorithm 4 for theoretical model selection, but it does not specify any dataset splits for validation in empirical experiments. The ethics statement indicates N/A for data.
Hardware Specification No The paper is theoretical and does not conduct experiments, therefore no hardware specifications are mentioned. The ethics statement indicates N/A for compute resources.
Software Dependencies No The paper is theoretical and does not conduct experiments, so no specific software dependencies with version numbers are listed.
Experiment Setup No The paper is theoretical and does not conduct experiments, so no experimental setup details like hyperparameters or training settings are provided.