reproducibilityindex.ai

Differentially Private Empirical Risk Minimization with Non-convex Loss Functions

Authors: Di Wang, Changyou Chen, Jinhui Xu

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study the problem of Empirical Risk Minimization (ERM) with (smooth) non-convex loss functions under the differential-privacy (DP) model. We ﬁrst study the expected excess empirical (or population) risk, which was primarily used as the utility to measure the quality for convex loss functions. Speciﬁcally, we show that the excess empirical (or population) risk can be upper bounded by 푂( 푑log(1 훿) log 푛휖2 ) in the (휖, 훿)-DP settings, where 푛is the data size and 푑is the dimensionality of the space. The 1 log 푛term in the empirical risk bound can be further improved to 1 푛Ω(1) (when 푑is a constant) by a highly non-trivial analysis on the time-average error. To obtain more efﬁcient solutions, we also consider the connection between achieving differential privacy and ﬁnding approximate local minimum. Particularly, we show that when the size 푛is large enough, there are (휖, 훿)-DP algorithms which can ﬁnd an approximate local minimum of the empirical risk with high probability in both the constrained and non-constrained settings. These results indicate that one can escape saddle points privately.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, USA. Emails: {dwang45,changyou,jinhui}@buffalo.edu.
Pseudocode	Yes	Algorithm 1 DP-GLD; Algorithm 2 DP-FW-L2; Algorithm 3 DP-FW-L1; Algorithm 4 DP-GD; Algorithm 5 Selecting SOSP; Algorithm 6 DP-GD-SO
Open Source Code	No	The paper does not provide any explicit statement about releasing open-source code for the methodology described, nor does it include a link to a code repository.
Open Datasets	No	The paper is theoretical and does not use a specific, named dataset. It refers generally to 'a dataset 퐷= {푧1 = (푥1, 푦1), 푧2 = (푥2, 푦2) , 푧푛= (푥푛, 푦푛)} from a data universe', which is a general problem setup, not a specific, publicly available dataset.
Dataset Splits	No	The paper is theoretical and does not involve experimental validation. Therefore, it does not specify training/test/validation dataset splits.
Hardware Specification	No	The paper is theoretical and focuses on algorithm design and proofs, not experimental implementation. Therefore, it does not describe the hardware used for experiments.
Software Dependencies	No	The paper is theoretical and focuses on algorithm design and proofs. It does not provide specific software dependencies with version numbers needed to replicate experiments.
Experiment Setup	No	The paper is theoretical and focuses on algorithm design and proofs. It does not provide specific experimental setup details such as hyperparameter values or training configurations.