reproducibilityindex.ai

Learning with User-Level Privacy

Authors: Daniel Levy, Ziteng Sun, Kareem Amin, Satyen Kale, Alex Kulesza, Mehryar Mohri, Ananda Theertha Suresh

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We show that for high-dimensional mean estimation, empirical risk minimization with smooth losses, stochastic convex optimization, and learning hypothesis classes with ﬁnite metric entropy, the privacy cost decreases as O(1/ m) as users provide more samples. In contrast, when increasing the number of users n, the privacy cost decreases at a faster O(1/n) rate. We complement these results with lower bounds showing the minimax optimality of our algorithms for mean estimation and stochastic convex optimization. Our algorithms rely on novel techniques for private mean estimation in arbitrary dimension with error scaling as the concentration radius τ of the distribution rather than the entire range.
Researcher Affiliation	Collaboration	Daniel Levy ,1 Ziteng Sun ,2 Kareem Amin3 Satyen Kale3 Alex Kulesza3 Mehryar Mohri3,4 Ananda Theertha Suresh3 1Stanford University 2Cornell University 3Google Research 4Courant Institute
Pseudocode	Yes	Algorithm 1 Winsorized Mean1D(Xn, ε, τ, B): Winsorized Mean Estimator (WME); Algorithm 2 Winsorized Mean High D(Xn, ε, δ, τ, B, γ): WME High Dimension; Algorithm 3 Winsorized First-Order Optimization; Algorithm 4 APhased ERM: Phased ERM
Open Source Code	No	The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets	No	The paper is theoretical and discusses 'samples from each users distributions' and 'user-level dataset S' generally, but does not specify or provide access information for any publicly available or open datasets for empirical training.
Dataset Splits	No	The paper is theoretical and does not describe empirical experiments with dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not mention any hardware specifications used for experiments.
Software Dependencies	No	The paper is theoretical and does not specify any software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not provide specific experimental setup details, such as concrete hyperparameter values or training configurations for empirical runs.