Public-data Assisted Private Stochastic Optimization: Power and Limitations

Authors: Enayat Ullah, Michael Menart, Raef Bassily, Cristóbal Guzmán, Raman Arora

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the limits and capability of public-data assisted differentially private (PA-DP) algorithms. Specifically, we focus on the problem of stochastic convex optimization (SCO) with either labeled or unlabeled public data. For complete/labeled public data, we show that any (ϵ, δ)-PA-DP has excess risk Ω min 1 npub , 1 n + d nϵ (...) These lower bounds are established via our new lower bounds for PA-DP mean estimation, which are of a similar form. First, we show a tight lower bound for the problem of differentially-private stochastic convex optimization (DP-SCO) assisted with complete public data (...) For (Euclidean) GLMs we develop an efficient algorithm which, given O(nprivϵ) unlabeled public data points, achieves the dimension independent rate O 1 npriv + 1 nprivϵ .
Researcher Affiliation Collaboration Enayat Ullah Meta enayat@meta.com Michael Menart Department of Computer Science & Engineering The Ohio State University Department of Computer Science, University of Toronto Vector Institute menart.2@osu.edu Raef Bassily Department of Computer Science & Engineering Translational Data Analytics Institute (TDAI) The Ohio State University bassily.1@osu.edu Cristóbal Guzmán Inst. for Mathematical and Comput. Eng. Fac. de Matemáticas and Esc. de Ingeniería Pontificia Universidad Católica de Chile crguzmanp@uc.cl Raman Arora Department of Computer Science Johns Hopkins University arora@cs.jhu.edu
Pseudocode Yes Algorithm 1 Efficient PA-DP learning of GLMs with unlabeled public data. Algorithm 2 Supervised private learning with public unlabeled data.
Open Source Code No The paper does not mention providing open-source code for the methodology described. The NeurIPS Paper Checklist states 'Answer: [NA] Justification: There is no associated code.'
Open Datasets No The paper is theoretical and does not involve empirical evaluation on specific datasets. It defines concepts like 'Spub, Spriv i.i.d. D' for theoretical analysis but does not refer to any specific, publicly available datasets used for training or evaluation.
Dataset Splits No The paper is theoretical and does not involve empirical experiments, thus no training, validation, or test dataset splits are mentioned.
Hardware Specification No The paper is theoretical and does not conduct experiments, thus no hardware specifications are mentioned. The NeurIPS Paper Checklist states 'Answer: [NA] Justification: There are no experiments.'
Software Dependencies No The paper is theoretical and does not conduct experiments, thus no specific software dependencies with version numbers are mentioned. The NeurIPS Paper Checklist states 'Answer: [NA] Justification: There are no experiments.'
Experiment Setup No The paper is theoretical and does not conduct experiments, thus no experimental setup details like hyperparameters or training settings are provided. The NeurIPS Paper Checklist states 'Answer: [NA] Justification: There are no experiments.'