User-Level Differentially Private Learning via Correlated Sampling
Authors: Badih Ghazi, Ravi Kumar, Pasin Manurangsi
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our first main result is that, for any online learnable class, it is possible to learn the class with an (", δ)-DP algorithm using only O(log(1/δ)/") users, as long as each user has at least poly(d/ ) samples (Theorem 1), where d is the Littlestone dimension and is the error of the hypothesis output by the learner. Our second result is a generic "-DP learner in the local model for any class C with finite probabilistic representation dimension PRDim(C). This gives a separation in the local DP model between the user-level and item-level settings; the sample complexity in the latter is known to be polynomial in the statistical query (SQ) dimension [KLN+11], which can be exponentially larger than the probabilistic representation dimension. In the central model, we get a slightly improved bound on the number of users in terms of 1/". We can also show that the number of users required in the above results is essentially the smallest possible (up to a 1/" factor in Theorem 2), as stated below. Lemma 4. For any , β 1/4, " 2 (0, 0.1"1.1), if there exists an (", δ)DP ( , β)-accurate learner on n users for a concept class C, then we must have n (min{log(1/δ), PRDim(C)}/"). |
| Researcher Affiliation | Industry | Badih Ghazi Ravi Kumar Pasin Manurangsi Google, Mountain View, CA. {badihghazi, ravi.k53}@gmail.com, pasin@google.com |
| Pseudocode | Yes | Algorithm 1 Pseudo-Globally Stable Learner A0. Algorithm 2 Approximate-DP Learner in the Central Model. |
| Open Source Code | No | The paper does not provide a statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | This is a theoretical paper that focuses on algorithm design, proofs, and bounds. It does not conduct empirical studies with specific datasets, and therefore does not mention train dataset availability. |
| Dataset Splits | No | This is a theoretical paper that focuses on algorithm design, proofs, and bounds. It does not conduct empirical studies, and therefore does not provide information on training/test/validation dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not report on empirical experiments. Therefore, no hardware specifications are provided for running experiments. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithms and proofs. It does not mention any specific software dependencies with version numbers required for replication of empirical experiments. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and analysis rather than empirical experimentation. Therefore, it does not provide details about an experimental setup, such as hyperparameters or system-level training settings. |