Learning with User-Level Privacy

Authors: Daniel Levy, Ziteng Sun, Kareem Amin, Satyen Kale, Alex Kulesza, Mehryar Mohri, Ananda Theertha Suresh

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We show that for high-dimensional mean estimation, empirical risk minimization with smooth losses, stochastic convex optimization, and learning hypothesis classes with finite metric entropy, the privacy cost decreases as O(1/ m) as users provide more samples. In contrast, when increasing the number of users n, the privacy cost decreases at a faster O(1/n) rate. We complement these results with lower bounds showing the minimax optimality of our algorithms for mean estimation and stochastic convex optimization. Our algorithms rely on novel techniques for private mean estimation in arbitrary dimension with error scaling as the concentration radius τ of the distribution rather than the entire range.
Researcher Affiliation Collaboration Daniel Levy ,1 Ziteng Sun ,2 Kareem Amin3 Satyen Kale3 Alex Kulesza3 Mehryar Mohri3,4 Ananda Theertha Suresh3 1Stanford University 2Cornell University 3Google Research 4Courant Institute
Pseudocode Yes Algorithm 1 Winsorized Mean1D(Xn, ε, τ, B): Winsorized Mean Estimator (WME); Algorithm 2 Winsorized Mean High D(Xn, ε, δ, τ, B, γ): WME High Dimension; Algorithm 3 Winsorized First-Order Optimization; Algorithm 4 APhased ERM: Phased ERM
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets No The paper is theoretical and discusses 'samples from each users distributions' and 'user-level dataset S' generally, but does not specify or provide access information for any publicly available or open datasets for empirical training.
Dataset Splits No The paper is theoretical and does not describe empirical experiments with dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not mention any hardware specifications used for experiments.
Software Dependencies No The paper is theoretical and does not specify any software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not provide specific experimental setup details, such as concrete hyperparameter values or training configurations for empirical runs.