Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
User-Level Differentially Private Learning via Correlated Sampling
Authors: Badih Ghazi, Ravi Kumar, Pasin Manurangsi
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our first main result is that, for any online learnable class, it is possible to learn the class with an (", δ)-DP algorithm using only O(log(1/δ)/") users, as long as each user has at least poly(d/ ) samples (Theorem 1), where d is the Littlestone dimension and is the error of the hypothesis output by the learner. Our second result is a generic "-DP learner in the local model for any class C with finite probabilistic representation dimension PRDim(C). This gives a separation in the local DP model between the user-level and item-level settings; the sample complexity in the latter is known to be polynomial in the statistical query (SQ) dimension [KLN+11], which can be exponentially larger than the probabilistic representation dimension. In the central model, we get a slightly improved bound on the number of users in terms of 1/". We can also show that the number of users required in the above results is essentially the smallest possible (up to a 1/" factor in Theorem 2), as stated below. Lemma 4. For any , β 1/4, " 2 (0, 0.1"1.1), if there exists an (", δ)DP ( , β)-accurate learner on n users for a concept class C, then we must have n (min{log(1/δ), PRDim(C)}/"). |
| Researcher Affiliation | Industry | Badih Ghazi Ravi Kumar Pasin Manurangsi Google, Mountain View, CA. EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Pseudo-Globally Stable Learner A0. Algorithm 2 Approximate-DP Learner in the Central Model. |
| Open Source Code | No | The paper does not provide a statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | This is a theoretical paper that focuses on algorithm design, proofs, and bounds. It does not conduct empirical studies with specific datasets, and therefore does not mention train dataset availability. |
| Dataset Splits | No | This is a theoretical paper that focuses on algorithm design, proofs, and bounds. It does not conduct empirical studies, and therefore does not provide information on training/test/validation dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not report on empirical experiments. Therefore, no hardware specifications are provided for running experiments. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithms and proofs. It does not mention any specific software dependencies with version numbers required for replication of empirical experiments. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and analysis rather than empirical experimentation. Therefore, it does not provide details about an experimental setup, such as hyperparameters or system-level training settings. |