reproducibilityindex.ai

User-Level Differentially Private Learning via Correlated Sampling

Authors: Badih Ghazi, Ravi Kumar, Pasin Manurangsi

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Our ﬁrst main result is that, for any online learnable class, it is possible to learn the class with an (", δ)-DP algorithm using only O(log(1/δ)/") users, as long as each user has at least poly(d/ ) samples (Theorem 1), where d is the Littlestone dimension and is the error of the hypothesis output by the learner. Our second result is a generic "-DP learner in the local model for any class C with ﬁnite probabilistic representation dimension PRDim(C). This gives a separation in the local DP model between the user-level and item-level settings; the sample complexity in the latter is known to be polynomial in the statistical query (SQ) dimension [KLN+11], which can be exponentially larger than the probabilistic representation dimension. In the central model, we get a slightly improved bound on the number of users in terms of 1/". We can also show that the number of users required in the above results is essentially the smallest possible (up to a 1/" factor in Theorem 2), as stated below. Lemma 4. For any , β 1/4, " 2 (0, 0.1"1.1), if there exists an (", δ)DP ( , β)-accurate learner on n users for a concept class C, then we must have n (min{log(1/δ), PRDim(C)}/").
Researcher Affiliation	Industry	Badih Ghazi Ravi Kumar Pasin Manurangsi Google, Mountain View, CA. {badihghazi, ravi.k53}@gmail.com, pasin@google.com
Pseudocode	Yes	Algorithm 1 Pseudo-Globally Stable Learner A0. Algorithm 2 Approximate-DP Learner in the Central Model.
Open Source Code	No	The paper does not provide a statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	This is a theoretical paper that focuses on algorithm design, proofs, and bounds. It does not conduct empirical studies with specific datasets, and therefore does not mention train dataset availability.
Dataset Splits	No	This is a theoretical paper that focuses on algorithm design, proofs, and bounds. It does not conduct empirical studies, and therefore does not provide information on training/test/validation dataset splits.
Hardware Specification	No	The paper is theoretical and does not report on empirical experiments. Therefore, no hardware specifications are provided for running experiments.
Software Dependencies	No	The paper is theoretical and focuses on algorithms and proofs. It does not mention any specific software dependencies with version numbers required for replication of empirical experiments.
Experiment Setup	No	The paper is theoretical and focuses on algorithm design and analysis rather than empirical experimentation. Therefore, it does not provide details about an experimental setup, such as hyperparameters or system-level training settings.