Gradient Perturbation is Underrated for Differentially Private Convex Optimization

Authors: Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, our extensive experiments suggest that gradient perturbation with the advanced composition method indeed outperforms other perturbation approaches by a large margin, matching our theoretical findings.
Researcher Affiliation Collaboration 1The School of Data and Computer Science, Sun Yat-sen University. Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou 510006, P.R.China 2Microsoft Research Asia, Beijing, China
Pseudocode Yes Algorithm 1 DP-GD; Algorithm 2 DP-SGD
Open Source Code No The paper does not provide any specific links or statements about releasing open-source code for the described methodology.
Open Datasets Yes We present the results of four benchmark datasets in [Iyengar et al., 2019], including one multi-class dataset (MNIST) and two with high dimensional features (Real-sim, RCV1). Detailed description of datasets can be found in Table 3.
Dataset Splits Yes We use 80% data for training and the rest for testing, the same as [Iyengar et al., 2019].
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers).
Experiment Setup Yes Running step T is chosen from {50, 200, 800} for both DP-GD and DP-SGD. The standard deviation of the added noise σ is set to be the smallest value such that the privacy budget is allowable to run desired steps. Clipping threshold is set as 1 (0.5 for high dimensional datasets because of the sparse gradient). Privacy parameter δ is set as 1/n^2. The l2 regularization coefficient is set as 1e-4. For DP-GD, learning rate is chosen from {0.1, 1.0, 5.0} ({0.2, 2.0, 10.0} for high dimensional datasets). For DP-SGD, we use moments accountant to track the privacy loss and the sampling ratio is set as 0.1 (roughly the mini-batch size is 0.1 dataset size). The learning rate of DP-SGD is twice as large as DP-GD and it is divided by 2 at the middle of training. All reported numbers are averaged over 20 runs.