User-Level Differentially Private Learning via Correlated Sampling

Authors: Badih Ghazi, Ravi Kumar, Pasin Manurangsi

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our first main result is that, for any online learnable class, it is possible to learn the class with an (", δ)-DP algorithm using only O(log(1/δ)/") users, as long as each user has at least poly(d/ ) samples (Theorem 1), where d is the Littlestone dimension and is the error of the hypothesis output by the learner. Our second result is a generic "-DP learner in the local model for any class C with finite probabilistic representation dimension PRDim(C). This gives a separation in the local DP model between the user-level and item-level settings; the sample complexity in the latter is known to be polynomial in the statistical query (SQ) dimension [KLN+11], which can be exponentially larger than the probabilistic representation dimension. In the central model, we get a slightly improved bound on the number of users in terms of 1/". We can also show that the number of users required in the above results is essentially the smallest possible (up to a 1/" factor in Theorem 2), as stated below. Lemma 4. For any , β 1/4, " 2 (0, 0.1"1.1), if there exists an (", δ)DP ( , β)-accurate learner on n users for a concept class C, then we must have n (min{log(1/δ), PRDim(C)}/").
Researcher Affiliation Industry Badih Ghazi Ravi Kumar Pasin Manurangsi Google, Mountain View, CA. {badihghazi, ravi.k53}@gmail.com, pasin@google.com
Pseudocode Yes Algorithm 1 Pseudo-Globally Stable Learner A0. Algorithm 2 Approximate-DP Learner in the Central Model.
Open Source Code No The paper does not provide a statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No This is a theoretical paper that focuses on algorithm design, proofs, and bounds. It does not conduct empirical studies with specific datasets, and therefore does not mention train dataset availability.
Dataset Splits No This is a theoretical paper that focuses on algorithm design, proofs, and bounds. It does not conduct empirical studies, and therefore does not provide information on training/test/validation dataset splits.
Hardware Specification No The paper is theoretical and does not report on empirical experiments. Therefore, no hardware specifications are provided for running experiments.
Software Dependencies No The paper is theoretical and focuses on algorithms and proofs. It does not mention any specific software dependencies with version numbers required for replication of empirical experiments.
Experiment Setup No The paper is theoretical and focuses on algorithm design and analysis rather than empirical experimentation. Therefore, it does not provide details about an experimental setup, such as hyperparameters or system-level training settings.