A Unified Fast Gradient Clipping Framework for DP-SGD
Authors: Weiwei Kong, Andres Munoz Medina
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, preliminary numerical experiments are given to demonstrate the substantial effects of the aforementioned improvements. This section presents numerical experiments that compare our proposed adjoint-based framework (Adjoint) against the na ıve implementation of DP-SGD (Naive), which computes gradients for each example in a batch, and the classic ghost clipping frameworks (Ghost Clip) that are described in Subsections 5.1 and 5.2. |
| Researcher Affiliation | Industry | Weiwei Kong Google Research weiweikong@google.com Andres Muñoz Medina Google Research ammedina@google.com |
| Pseudocode | Yes | Algorithm 1 DP-SGD algorithm; Algorithm 2 General Gradient Norm Framework |
| Open Source Code | Yes | To complement the results of this paper, we open-sourced the general interface of the code using the Tensor Flow Keras API2. By introducing an abstract interface, we also expect practitioners to easily extend the ghost clipping algorithm to any type of layer. 2See the repo in https://github.com/google-research/google-research/tree/master/fast_gradient_clipping |
| Open Datasets | No | Each experiment consists of a single training loop of 50 iterations with uniformly sampled random input data of a query size of 5. The paper does not provide concrete access information (link, DOI, formal citation) for a publicly available dataset used for training. |
| Dataset Splits | No | The paper mentions using 'uniformly sampled random input data' but does not provide specific dataset split information (percentages, sample counts, or detailed methodology) for training, validation, or testing. |
| Hardware Specification | Yes | Each problem instances was run on a cloud computing platform consisting of (i) 112 Intel(R) Xeon(R) Platinum processors with 28 cores each, (ii) 64 GB of RAM, (iii) Python 3.10.11, and (iv) Tensorflow 2.14. |
| Software Dependencies | Yes | Each problem instances was run on a cloud computing platform consisting of (i) 112 Intel(R) Xeon(R) Platinum processors with 28 cores each, (ii) 64 GB of RAM, (iii) Python 3.10.11, and (iv) Tensorflow 2.14. |
| Experiment Setup | Yes | We also simplify our computations by utilizing batches of size |B| = 1 for the first two subsections. The loss function used in our experiment, ℓx( ), is the mean-squared-error. To reduce the variance of the results in the first two subsections, we repeat each problem instance 20 times and report only the median runtime and memory cost over the repetitions. Each experiment consists of a single training loop of 50 iterations with uniformly sampled random input data of a query size of 5. |