A Unified Fast Gradient Clipping Framework for DP-SGD

Authors: Weiwei Kong, Andres Munoz Medina

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, preliminary numerical experiments are given to demonstrate the substantial effects of the aforementioned improvements. This section presents numerical experiments that compare our proposed adjoint-based framework (Adjoint) against the na ıve implementation of DP-SGD (Naive), which computes gradients for each example in a batch, and the classic ghost clipping frameworks (Ghost Clip) that are described in Subsections 5.1 and 5.2.
Researcher Affiliation Industry Weiwei Kong Google Research weiweikong@google.com Andres Muñoz Medina Google Research ammedina@google.com
Pseudocode Yes Algorithm 1 DP-SGD algorithm; Algorithm 2 General Gradient Norm Framework
Open Source Code Yes To complement the results of this paper, we open-sourced the general interface of the code using the Tensor Flow Keras API2. By introducing an abstract interface, we also expect practitioners to easily extend the ghost clipping algorithm to any type of layer. 2See the repo in https://github.com/google-research/google-research/tree/master/fast_gradient_clipping
Open Datasets No Each experiment consists of a single training loop of 50 iterations with uniformly sampled random input data of a query size of 5. The paper does not provide concrete access information (link, DOI, formal citation) for a publicly available dataset used for training.
Dataset Splits No The paper mentions using 'uniformly sampled random input data' but does not provide specific dataset split information (percentages, sample counts, or detailed methodology) for training, validation, or testing.
Hardware Specification Yes Each problem instances was run on a cloud computing platform consisting of (i) 112 Intel(R) Xeon(R) Platinum processors with 28 cores each, (ii) 64 GB of RAM, (iii) Python 3.10.11, and (iv) Tensorflow 2.14.
Software Dependencies Yes Each problem instances was run on a cloud computing platform consisting of (i) 112 Intel(R) Xeon(R) Platinum processors with 28 cores each, (ii) 64 GB of RAM, (iii) Python 3.10.11, and (iv) Tensorflow 2.14.
Experiment Setup Yes We also simplify our computations by utilizing batches of size |B| = 1 for the first two subsections. The loss function used in our experiment, ℓx( ), is the mean-squared-error. To reduce the variance of the results in the first two subsections, we repeat each problem instance 20 times and report only the median runtime and memory cost over the repetitions. Each experiment consists of a single training loop of 50 iterations with uniformly sampled random input data of a query size of 5.