Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Limits of Private Learning with Access to Public Data
Authors: Noga Alon, Raef Bassily, Shay Moran
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the limits of learning in this setting in terms of private and public sample complexities. We show that any hypothesis class of VC-dimension d can be agnostically learned up to an excess error of α using only (roughly) d/α public examples and d/α2 private labeled examples. This result holds even when the public examples are unlabeled. This gives a quadratic improvement over the standard d/α2 upper bound on the public sample complexity (where private examples can be ignored altogether if the public examples are labeled). Furthermore, we give a nearly matching lower bound, which we prove via a generic reduction from this setting to the one of private learning without public data. |
| Researcher Affiliation | Collaboration | Noga Alon Department of Mathematics Princeton University EMAIL Raef Bassily Department of Computer Science & Engineering The Ohio State University EMAIL Shay Moran Google AI Princeton EMAIL |
| Pseudocode | Yes | Algorithm 1 ASSPP: Semi-Supervised Semi-Private Agnostic Learner... Algorithm 2 Description of the private learner B:... Algorithm 3 Private Sample Generator Priv Samp:... Algorithm 4 Public Sample Generator Pub Samp: |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper is theoretical and focuses on sample complexities and bounds for learning. It defines concepts like "private sample" and "public sample" but does not use or specify a named, publicly available dataset for empirical evaluation. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical experiments with data splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe empirical experiments, thus no hardware specifications for running experiments are provided. |
| Software Dependencies | No | The paper is theoretical and does not describe empirical experiments requiring specific software versions. It presents algorithms and proofs. |
| Experiment Setup | No | The paper is theoretical and does not describe empirical experiments. Therefore, there are no specific experimental setup details like hyperparameters or system-level training settings. |