DIFF2: Differential Private Optimization via Gradient Differences for Nonconvex Distributed Learning

Authors: Tomoya Murata, Taiji Suzuki

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments are conducted to validate the superiority of DIFF2 framework.
Researcher Affiliation Collaboration NTT DATA Mathematical Systems Inc., Tokyo, Japan Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan.
Pseudocode Yes Algorithm 1 DIFF2(x0, σ1, σ2, C1, C2, T, R, args), Algorithm 2 Clipped Mean({xi}i I, C), Algorithm 3 GD-Routine(xr 1, evr, η), Algorithm 4 BVR-L-SGD-Routine(x0,r 1, ev1,r, η, b, σ3, C3, K)
Open Source Code No No explicit statement or link indicating that the source code for the methodology is openly available was found.
Open Datasets Yes We conducted regression and classification tasks on five dataset: (i) California Housing Data Set 7; (ii) Gas Turbine CO and NOx Emission Data Set 8; (iii) Blog Feedback Data Set 9; (iv) KDDCup99 Data Set 10; and (v) Cover Type Dataset 11.
Dataset Splits No For each dataset, we randomly split the orginal dataset into a 80 % train dataset and a 20 % test dataset. (No explicit validation split is provided beyond the main training and testing sets.)
Hardware Specification Yes CPU: AMD EPYC 7552 48-Core Processor. CPU Memory: 1.0 TB.
Software Dependencies Yes OS: Ubuntu 16.04.6 Programming language: Python 3.9.12. Deep learning framework: Pytorch 1.12.1.
Experiment Setup Yes For each dataset, we randomly split the orginal dataset into a 80 % train dataset and a 20 % test dataset. ... We used a one-hidden layer fully connected neural network with 10 hidden units and softplus activation. For loss function, we used the squared loss. ... differential privacy parameters were set to εDP {3.0, 5.0} and δDP = 10 5. We used Proposition 4.1 to determine the DP noise size of DP-GD (T = 1, u 1) and DIFF2-GD (T {0.003R, 0.01R, 0.03R, 0.1R}, u = 1.25). ... For tuning clipping radius and restart interval, we ran DP-GD with C1 {1, 3.0, 10.0, 30.0, 100.0} and ran DIFF2-GD with C1, C2 {1, 3.0, 10.0, 30.0, 100.0} and T {0.003R, 0.01R, 0.03R, 0.1R} for 2, 000 rounds. For tuning learning rate η, we ran each implemented algorithm with η {0.5i|i {0, . . . , 9}}. To reduce the execution time, train loss was evaluated every 20 rounds and the learning was stopped if the used learning rate was deemed inappropriate by checking the train loss.