DIFF2: Differential Private Optimization via Gradient Differences for Nonconvex Distributed Learning
Authors: Tomoya Murata, Taiji Suzuki
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments are conducted to validate the superiority of DIFF2 framework. |
| Researcher Affiliation | Collaboration | NTT DATA Mathematical Systems Inc., Tokyo, Japan Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan. |
| Pseudocode | Yes | Algorithm 1 DIFF2(x0, σ1, σ2, C1, C2, T, R, args), Algorithm 2 Clipped Mean({xi}i I, C), Algorithm 3 GD-Routine(xr 1, evr, η), Algorithm 4 BVR-L-SGD-Routine(x0,r 1, ev1,r, η, b, σ3, C3, K) |
| Open Source Code | No | No explicit statement or link indicating that the source code for the methodology is openly available was found. |
| Open Datasets | Yes | We conducted regression and classification tasks on five dataset: (i) California Housing Data Set 7; (ii) Gas Turbine CO and NOx Emission Data Set 8; (iii) Blog Feedback Data Set 9; (iv) KDDCup99 Data Set 10; and (v) Cover Type Dataset 11. |
| Dataset Splits | No | For each dataset, we randomly split the orginal dataset into a 80 % train dataset and a 20 % test dataset. (No explicit validation split is provided beyond the main training and testing sets.) |
| Hardware Specification | Yes | CPU: AMD EPYC 7552 48-Core Processor. CPU Memory: 1.0 TB. |
| Software Dependencies | Yes | OS: Ubuntu 16.04.6 Programming language: Python 3.9.12. Deep learning framework: Pytorch 1.12.1. |
| Experiment Setup | Yes | For each dataset, we randomly split the orginal dataset into a 80 % train dataset and a 20 % test dataset. ... We used a one-hidden layer fully connected neural network with 10 hidden units and softplus activation. For loss function, we used the squared loss. ... differential privacy parameters were set to εDP {3.0, 5.0} and δDP = 10 5. We used Proposition 4.1 to determine the DP noise size of DP-GD (T = 1, u 1) and DIFF2-GD (T {0.003R, 0.01R, 0.03R, 0.1R}, u = 1.25). ... For tuning clipping radius and restart interval, we ran DP-GD with C1 {1, 3.0, 10.0, 30.0, 100.0} and ran DIFF2-GD with C1, C2 {1, 3.0, 10.0, 30.0, 100.0} and T {0.003R, 0.01R, 0.03R, 0.1R} for 2, 000 rounds. For tuning learning rate η, we ran each implemented algorithm with η {0.5i|i {0, . . . , 9}}. To reduce the execution time, train loss was evaluated every 20 rounds and the learning was stopped if the used learning rate was deemed inappropriate by checking the train loss. |