Public-data Assisted Private Stochastic Optimization: Power and Limitations
Authors: Enayat Ullah, Michael Menart, Raef Bassily, Cristóbal Guzmán, Raman Arora
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the limits and capability of public-data assisted differentially private (PA-DP) algorithms. Specifically, we focus on the problem of stochastic convex optimization (SCO) with either labeled or unlabeled public data. For complete/labeled public data, we show that any (ϵ, δ)-PA-DP has excess risk Ω min 1 npub , 1 n + d nϵ (...) These lower bounds are established via our new lower bounds for PA-DP mean estimation, which are of a similar form. First, we show a tight lower bound for the problem of differentially-private stochastic convex optimization (DP-SCO) assisted with complete public data (...) For (Euclidean) GLMs we develop an efficient algorithm which, given O(nprivϵ) unlabeled public data points, achieves the dimension independent rate O 1 npriv + 1 nprivϵ . |
| Researcher Affiliation | Collaboration | Enayat Ullah Meta enayat@meta.com Michael Menart Department of Computer Science & Engineering The Ohio State University Department of Computer Science, University of Toronto Vector Institute menart.2@osu.edu Raef Bassily Department of Computer Science & Engineering Translational Data Analytics Institute (TDAI) The Ohio State University bassily.1@osu.edu Cristóbal Guzmán Inst. for Mathematical and Comput. Eng. Fac. de Matemáticas and Esc. de Ingeniería Pontificia Universidad Católica de Chile crguzmanp@uc.cl Raman Arora Department of Computer Science Johns Hopkins University arora@cs.jhu.edu |
| Pseudocode | Yes | Algorithm 1 Efficient PA-DP learning of GLMs with unlabeled public data. Algorithm 2 Supervised private learning with public unlabeled data. |
| Open Source Code | No | The paper does not mention providing open-source code for the methodology described. The NeurIPS Paper Checklist states 'Answer: [NA] Justification: There is no associated code.' |
| Open Datasets | No | The paper is theoretical and does not involve empirical evaluation on specific datasets. It defines concepts like 'Spub, Spriv i.i.d. D' for theoretical analysis but does not refer to any specific, publicly available datasets used for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments, thus no training, validation, or test dataset splits are mentioned. |
| Hardware Specification | No | The paper is theoretical and does not conduct experiments, thus no hardware specifications are mentioned. The NeurIPS Paper Checklist states 'Answer: [NA] Justification: There are no experiments.' |
| Software Dependencies | No | The paper is theoretical and does not conduct experiments, thus no specific software dependencies with version numbers are mentioned. The NeurIPS Paper Checklist states 'Answer: [NA] Justification: There are no experiments.' |
| Experiment Setup | No | The paper is theoretical and does not conduct experiments, thus no experimental setup details like hyperparameters or training settings are provided. The NeurIPS Paper Checklist states 'Answer: [NA] Justification: There are no experiments.' |