Noise-Aware Algorithm for Heterogeneous Differentially Private Federated Learning

Authors: Saber Malekmohammadi, Yaoliang Yu, Yang Cao

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on multiple datasets and our theoretical analysis confirm the effectiveness of Robust-HDP. Our code can be found here. We evaluate our proposed method on four benchamrk datasets: MNIST (Deng, 2012), FMNIST (Xiao et al., 2017) and CIFAR10/100 (Krizhevsky, 2009) using CNN-based models.
Researcher Affiliation Collaboration 1School of Computer Science, University of Waterloo, Waterloo, Canada 2Vector Institute, Toronto, Canada 3Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan.
Pseudocode Yes Algorithm 1 Robust-HDP; Algorithm 2 Wei Avg (Liu et al., 2021a); Algorithm 3 Principal Component Pursuit by Alternating Directions (Candes et al., 2009)
Open Source Code Yes Our code can be found here.
Open Datasets Yes We evaluate our proposed method on four benchamrk datasets: MNIST (Deng, 2012), FMNIST (Xiao et al., 2017) and CIFAR10/100 (Krizhevsky, 2009) using CNN-based models.
Dataset Splits Yes We consider a distributed setting with 20 clients. In order to create a heterogeneous dataset, we follow a similar procedure as in (Mc Mahan et al., 2017): first we split the data from each class into several shards. Then, each user is randomly assigned a number of shards of data. ... In this way, each user has 2400 data points for training, and 600 for testing. ... We consider a distributed setting with 20 clients, and split the 50,000 training samples and the 10,000 test samples in the datasets among them.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not list specific versions for any software dependencies (e.g., Python version, library versions like PyTorch, TensorFlow, etc.).
Experiment Setup Yes We fix δ for all clients to 10-4. We also set the clipping threshold c equal to 3, as it results in better test accuracy, as reported in (Abadi et al., 2016). For each algorithm and each dataset, we find the best learning rate from a grid: the one which is small enough to avoid divergence of the federated optimization, and results in the lowest average train loss (across clients) at the end of FL training. Here are the grids we use for each dataset: MNIST: {1e-4, 2e-4, 5e-4, 1e-3, 2e-3, 5e-3, 1e-2}; FMNIST: {1e-4, 2e-4, 5e-4, 1e-3, 2e-3, 5e-3, 1e-2}; CIFAR10: {1e-4, 2e-4, 5e-4, 1e-3, 2e-3, 5e-3, 1e-2}; CIFAR100: {1e-5, 2e-5, 5e-5, 1e-4, 2e-4, 5e-4, 1e-3}. We consider an FL setting with 20 clients as explained in Appendix B.1, which results in homogeneous {Ni}n i=1. We also assume full participation and one local epoch for each client (Ki = 1 for all i). We sample {ϵi}n i=1 from a set of distributions, as shown in Table 7 in the Appendix. We also sample batch sizes {bi}n i=1 uniformly from {16, 32, 64, 128}.