reproducibilityindex.ai

Private Hypothesis Selection

Authors: Mark Bun, Gautam Kamath, Thomas Steinke, Steven Z. Wu

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We provide a differentially private algorithm for hypothesis selection. Given samples from an unknown probability distribution P and a set of m probability distributions H, the goal is to output, in a ε-differentially private manner, a distribution from H whose total variation distance to P is comparable to that of the best such distribution (which we denote by α). The sample complexity of our basic algorithm is O log m αε , representing a minimal cost for privacy when compared to the non-private algorithm. We also can handle inﬁnite hypothesis classes H by relaxing to (ε, δ)-differential privacy. We apply our hypothesis selection algorithm to give learning algorithms for a number of natural distribution classes, including Gaussians, product distributions, sums of independent random variables, piecewise polynomials, and mixture classes. Our hypothesis selection procedure allows us to generically convert a cover for a class to a learning algorithm, complementing known learning lower bounds which are in terms of the size of the packing number of the class. As the covering and packing numbers are often closely related, for constant α, our algorithms achieve the optimal sample complexity for many classes of interest. Finally, we describe an application to private distribution-free PAC learning.
Researcher Affiliation	Collaboration	Mark Bun Department of Computer Science Boston University mbun@bu.edu Gautam Kamath Cheriton School of Computer Science University of Waterloo g@csail.mit.edu Thomas Steinke IBM Research phs@thomas-steinke.net Zhiwei Steven Wu Department of Computer Science & Engineering University of Minnesota zsw@umn.edu
Pseudocode	Yes	Algorithm 1: PAIRWISE CONTEST: PC(H, H , D, ζ, α) [...] Algorithm 2: PRIVATE HYPOTHESIS SELECTION: PHS(H, D, ε)
Open Source Code	No	The paper does not provide any explicit statements or links about open-sourcing the code for the described methodology.
Open Datasets	No	The paper is theoretical and works with abstract 'samples from an unknown probability distribution P' and a 'dataset D = {X1, . . . , Xn}'. It does not specify or use any named, publicly available datasets.
Dataset Splits	No	The paper is theoretical and does not describe empirical experiments involving dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not discuss any specific hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not mention any specific software dependencies or versions required for implementation or experiments.
Experiment Setup	No	The paper is theoretical and focuses on algorithm design and theoretical guarantees. It does not provide details of an experimental setup, such as hyperparameters or system-level training settings.