reproducibilityindex.ai

DIFF2: Differential Private Optimization via Gradient Differences for Nonconvex Distributed Learning

Authors: Tomoya Murata, Taiji Suzuki

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments are conducted to validate the superiority of DIFF2 framework.
Researcher Affiliation	Collaboration	NTT DATA Mathematical Systems Inc., Tokyo, Japan Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan.
Pseudocode	Yes	Algorithm 1 DIFF2(x0, σ1, σ2, C1, C2, T, R, args), Algorithm 2 Clipped Mean({xi}i I, C), Algorithm 3 GD-Routine(xr 1, evr, η), Algorithm 4 BVR-L-SGD-Routine(x0,r 1, ev1,r, η, b, σ3, C3, K)
Open Source Code	No	No explicit statement or link indicating that the source code for the methodology is openly available was found.
Open Datasets	Yes	We conducted regression and classification tasks on five dataset: (i) California Housing Data Set 7; (ii) Gas Turbine CO and NOx Emission Data Set 8; (iii) Blog Feedback Data Set 9; (iv) KDDCup99 Data Set 10; and (v) Cover Type Dataset 11.
Dataset Splits	No	For each dataset, we randomly split the orginal dataset into a 80 % train dataset and a 20 % test dataset. (No explicit validation split is provided beyond the main training and testing sets.)
Hardware Specification	Yes	CPU: AMD EPYC 7552 48-Core Processor. CPU Memory: 1.0 TB.
Software Dependencies	Yes	OS: Ubuntu 16.04.6 Programming language: Python 3.9.12. Deep learning framework: Pytorch 1.12.1.
Experiment Setup	Yes	For each dataset, we randomly split the orginal dataset into a 80 % train dataset and a 20 % test dataset. ... We used a one-hidden layer fully connected neural network with 10 hidden units and softplus activation. For loss function, we used the squared loss. ... differential privacy parameters were set to εDP {3.0, 5.0} and δDP = 10 5. We used Proposition 4.1 to determine the DP noise size of DP-GD (T = 1, u 1) and DIFF2-GD (T {0.003R, 0.01R, 0.03R, 0.1R}, u = 1.25). ... For tuning clipping radius and restart interval, we ran DP-GD with C1 {1, 3.0, 10.0, 30.0, 100.0} and ran DIFF2-GD with C1, C2 {1, 3.0, 10.0, 30.0, 100.0} and T {0.003R, 0.01R, 0.03R, 0.1R} for 2, 000 rounds. For tuning learning rate η, we ran each implemented algorithm with η {0.5i\|i {0, . . . , 9}}. To reduce the execution time, train loss was evaluated every 20 rounds and the learning was stopped if the used learning rate was deemed inappropriate by checking the train loss.