Understanding Gradient Clipping in Private SGD: A Geometric Perspective
Authors: Xiangyi Chen, Steven Z. Wu, Mingyi Hong
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we investigate whether the gradient distributions of DP-SGD are approximate symmetric in practice. However, since the gradient distributions are high-dimensional, certifying symmetricity is in general intractable. We instead consider two simple proxy measures and visualizations. Setup. We run DP-SGD implemented in Tensorflow 3 on two popular datasets MNIST [Le Cun et al., 2010] and CIFAR-10 [Krizhevsky et al., 2009]. For MNIST, we train a CNN with two convolution layers with 16 4 4 kernels followed by a fully connected layer with 32 nodes. We use DP-SGD to train the model with α = 0.15, and a batchsize of 128. For CIFAR-10, we train a CNN with two convolutional layers with 2 2 max pooling of stride 2 followed by a fully connected layer, all using Re LU activation, each layer uses a dropout rate of 0.5. The two convolution layer has 32 and 64 3 3 kernels, the fully connected layer has 1500 nodes. We use α = 0.001 and decrease it by 10 times every 20 epochs. The clip norm of both experiments is set to be c = 1 and the noise multiplier is 1.1. |
| Researcher Affiliation | Academia | Xiangyi Chen University of Minnesota chen5719@umn.edu Zhiwei Steven Wu Carnegie Mellon University zstevenwu@cmu.edu Mingyi Hong University of Minnesota mhong@umn.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks clearly labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | No | The paper mentions 'DP-SGD implemented in Tensorflow' with a footnote linking to a Tensorflow GitHub repository, which is a third-party tool used, not source code for the methodology developed in this paper. |
| Open Datasets | Yes | We run DP-SGD implemented in Tensorflow 3 on two popular datasets MNIST [Le Cun et al., 2010] and CIFAR-10 [Krizhevsky et al., 2009]. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Tensorflow' but does not provide specific version numbers for it or any other software dependencies, which are required for reproducible descriptions. |
| Experiment Setup | Yes | Setup. We run DP-SGD implemented in Tensorflow 3 on two popular datasets MNIST [Le Cun et al., 2010] and CIFAR-10 [Krizhevsky et al., 2009]. For MNIST, we train a CNN with two convolution layers with 16 4 4 kernels followed by a fully connected layer with 32 nodes. We use DP-SGD to train the model with α = 0.15, and a batchsize of 128. For CIFAR-10, we train a CNN with two convolutional layers with 2 2 max pooling of stride 2 followed by a fully connected layer, all using Re LU activation, each layer uses a dropout rate of 0.5. The two convolution layer has 32 and 64 3 3 kernels, the fully connected layer has 1500 nodes. We use α = 0.001 and decrease it by 10 times every 20 epochs. The clip norm of both experiments is set to be c = 1 and the noise multiplier is 1.1. |