reproducibilityindex.ai

Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data

Authors: Gautam Kamath, Xingtu Liu, Huanyu Zhang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study stochastic convex optimization with heavy-tailed data under the constraint of differential privacy (DP). Most prior work on this problem is restricted to the case where the loss function is Lipschitz. Instead, as introduced by Wang, Xiao, Devadas, and Xu (Wang et al., 2020), we study general convex loss functions with the assumption that the distribution of gradients has bounded k-th moments. We provide improved upper bounds on the excess population risk under concentrated DP for convex and strongly convex loss functions. Along the way, we derive new algorithms for private mean estimation of heavy-tailed distributions, under both pure and concentrated DP. Finally, we prove nearly-matching lower bounds for private stochastic convex optimization with strongly convex losses and mean estimation, showing new separations between pure and concentrated DP.
Researcher Affiliation	Collaboration	1Cheriton School of Computer Science, University of Waterloo 2Meta. Correspondence to: Huanyu Zhang <huanyuzhang@fb.com>.
Pseudocode	Yes	Algorithm 1 SCO algorithmic framework SCOFη,T,Mean Oracle(X)... Algorithm 2 CDP-SCO algorithm with heavy-tailed data... Algorithm 3 High-Dimensional Mean Estimator... Algorithm 4 CDP Noise Smoothing Mean Estimator...
Open Source Code	No	The paper does not include any explicit statements about open-source code availability for the described methodology, nor does it provide links to code repositories in the main text or acknowledgments.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and proofs, rather than conducting empirical experiments with specific datasets. While it refers to a theoretical "dataset x1, ..., xn drawn i.i.d. from some unknown distribution D" in its problem setup, it does not use or provide access information for any publicly available or open dataset.
Dataset Splits	No	The paper is theoretical and does not report on empirical experiments involving data partitioning. Therefore, it does not provide specific dataset split information (e.g., percentages, sample counts, or predefined splits) for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and focuses on algorithms and proofs rather than empirical experiments. Therefore, it does not specify any hardware details (e.g., GPU/CPU models, processors, memory) used for running experiments.
Software Dependencies	No	The paper is theoretical and does not present empirical experiments that would require specific software dependencies with version numbers for reproducibility. The algorithms are presented in pseudocode form, without implementation details.
Experiment Setup	No	The paper is theoretical and focuses on algorithms and proofs, not empirical experiments. While it defines theoretical parameters for its algorithms (e.g., \(\eta\), \(T\), \(\tau\), \(\rho\)), these are part of the theoretical framework and not details for an experimental setup, such as hyperparameters or training settings.