The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning

Authors: Wei-Ning Chen, Christopher A Choquette Choo, Peter Kairouz, Ananda Theertha Suresh

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we evaluate our proposed scheme on real-world federated learning tasks. We find that our theoretical analysis is well matched in practice.
Researcher Affiliation Collaboration 1Stanford University 2Google Research.
Pseudocode Yes Algorithm 1 The DDG mechanism; Algorithm 2 Private DME with random projection; Algorithm 3 Gradient Count-Mean Sketch Encoding.; Algorithm 4 Gradient Count-Mean Sketch Decoding; Algorithm 5 Distributed discrete Gaussian mechanism DDGenc (with detailed parameters); Algorithm 6 sparse DME via Compressed Sensing
Open Source Code Yes The code is available at https: //github.com/google-research/federated/ tree/master/private_linear_compression.
Open Datasets Yes We run experiments on the full Federated EMNIST, Stack Overflow dataset, and Shakespare three common benchmarks for FL taskss (Caldas et al., 2018).
Dataset Splits Yes F-EMNIST has 62 classes and N = 3400 clients, with each user holding both a train and test set of examples. In total, there are 671, 585 training examples and 77, 483 test examples.
Hardware Specification No The paper mentions running experiments and training models (e.g., '10^6 parameter CNN', '4x10^6 parameter LSTM model') but does not specify any hardware details such as GPU/CPU models, memory, or cloud instances used.
Software Dependencies No The paper describes training procedures (e.g., 'SGD', 'geometric adaptive clipping', 'Discrete Fourier Transform') and hyperparameter values, but it does not list specific software libraries or frameworks with their version numbers (e.g., 'PyTorch 1.9', 'TensorFlow 2.x').
Experiment Setup Yes On F-EMNIST, we use a server learning rate of 1. normalized by n (the number of clients) and momentum of 0.9 (Polyak, 1964); the client uses a learning rate of 0.01 without momentum. On Stack Overflow, we use a server learning rate of 1.78 normalized by n and momentum of 0.9; the client uses a learning rate of 0.3. For distributed DP, we use the geometric adaptive clipping of (Andrew et al., 2019) with an initial ℓ2 clipping norm of 0.1 and a target quantile of 0.5. We use the same procedure as (Kairouz et al., 2021a) and flatten using the Discrete Fourier Transform, pick β = exp ( 0.5) as the conditional randomized rounding bias, and use a modular clipping target probability of 6.33e 5 or 4 standard deviations at the server (assuming normally distributed updates). We communicate 16 bits per parameter for F-EMNIST and 18 bits for SONWP unless otherwise indicated.