Understanding Gradient Clipping in Private SGD: A Geometric Perspective

Authors: Xiangyi Chen, Steven Z. Wu, Mingyi Hong

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we investigate whether the gradient distributions of DP-SGD are approximate symmetric in practice. However, since the gradient distributions are high-dimensional, certifying symmetricity is in general intractable. We instead consider two simple proxy measures and visualizations. Setup. We run DP-SGD implemented in Tensorflow 3 on two popular datasets MNIST [Le Cun et al., 2010] and CIFAR-10 [Krizhevsky et al., 2009]. For MNIST, we train a CNN with two convolution layers with 16 4 4 kernels followed by a fully connected layer with 32 nodes. We use DP-SGD to train the model with α = 0.15, and a batchsize of 128. For CIFAR-10, we train a CNN with two convolutional layers with 2 2 max pooling of stride 2 followed by a fully connected layer, all using Re LU activation, each layer uses a dropout rate of 0.5. The two convolution layer has 32 and 64 3 3 kernels, the fully connected layer has 1500 nodes. We use α = 0.001 and decrease it by 10 times every 20 epochs. The clip norm of both experiments is set to be c = 1 and the noise multiplier is 1.1.
Researcher Affiliation Academia Xiangyi Chen University of Minnesota chen5719@umn.edu Zhiwei Steven Wu Carnegie Mellon University zstevenwu@cmu.edu Mingyi Hong University of Minnesota mhong@umn.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks clearly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code No The paper mentions 'DP-SGD implemented in Tensorflow' with a footnote linking to a Tensorflow GitHub repository, which is a third-party tool used, not source code for the methodology developed in this paper.
Open Datasets Yes We run DP-SGD implemented in Tensorflow 3 on two popular datasets MNIST [Le Cun et al., 2010] and CIFAR-10 [Krizhevsky et al., 2009].
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'Tensorflow' but does not provide specific version numbers for it or any other software dependencies, which are required for reproducible descriptions.
Experiment Setup Yes Setup. We run DP-SGD implemented in Tensorflow 3 on two popular datasets MNIST [Le Cun et al., 2010] and CIFAR-10 [Krizhevsky et al., 2009]. For MNIST, we train a CNN with two convolution layers with 16 4 4 kernels followed by a fully connected layer with 32 nodes. We use DP-SGD to train the model with α = 0.15, and a batchsize of 128. For CIFAR-10, we train a CNN with two convolutional layers with 2 2 max pooling of stride 2 followed by a fully connected layer, all using Re LU activation, each layer uses a dropout rate of 0.5. The two convolution layer has 32 and 64 3 3 kernels, the fully connected layer has 1500 nodes. We use α = 0.001 and decrease it by 10 times every 20 epochs. The clip norm of both experiments is set to be c = 1 and the noise multiplier is 1.1.