Limits of Private Learning with Access to Public Data

Authors: Noga Alon, Raef Bassily, Shay Moran

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the limits of learning in this setting in terms of private and public sample complexities. We show that any hypothesis class of VC-dimension d can be agnostically learned up to an excess error of α using only (roughly) d/α public examples and d/α2 private labeled examples. This result holds even when the public examples are unlabeled. This gives a quadratic improvement over the standard d/α2 upper bound on the public sample complexity (where private examples can be ignored altogether if the public examples are labeled). Furthermore, we give a nearly matching lower bound, which we prove via a generic reduction from this setting to the one of private learning without public data.
Researcher Affiliation Collaboration Noga Alon Department of Mathematics Princeton University nalon@math.princeton.edu Raef Bassily Department of Computer Science & Engineering The Ohio State University bassily.1@osu.edu Shay Moran Google AI Princeton shaymoran1@gmail.com
Pseudocode Yes Algorithm 1 ASSPP: Semi-Supervised Semi-Private Agnostic Learner... Algorithm 2 Description of the private learner B:... Algorithm 3 Private Sample Generator Priv Samp:... Algorithm 4 Public Sample Generator Pub Samp:
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper is theoretical and focuses on sample complexities and bounds for learning. It defines concepts like "private sample" and "public sample" but does not use or specify a named, publicly available dataset for empirical evaluation.
Dataset Splits No The paper is theoretical and does not conduct empirical experiments with data splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe empirical experiments, thus no hardware specifications for running experiments are provided.
Software Dependencies No The paper is theoretical and does not describe empirical experiments requiring specific software versions. It presents algorithms and proofs.
Experiment Setup No The paper is theoretical and does not describe empirical experiments. Therefore, there are no specific experimental setup details like hyperparameters or system-level training settings.