DAdaQuant: Doubly-adaptive quantization for communication-efficient Federated Learning

Authors: Robert Hönig, Yiren Zhao, Robert Mullins

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that DAda Quant consistently improves client server compression, outperforming the strongest nonadaptive baselines by up to 2.8 .
Researcher Affiliation Academia 1Department of Computer Science, ETH Zurich, Zurich, Switzerland 2Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom.
Pseudocode Yes Algorithm 1 The Fed Avg and DAda Quant algorithms. The uncolored lines list Fed Avg. Adding the colored lines creates DAda Quant. quantization, client-adaptive quantization, time-adaptive quantization.
Open Source Code Yes Our submission includes a repository with the source code for DAda Quant and for the experiments presented in this paper.
Open Datasets Yes All the datasets used in our experiments are publicly available. To this end, we use DAda Quant to train a linear model, CNNs and LSTMs of varying complexity on a federated synthetic dataset (Synthetic), as well as two federated image datasets (FEMNIST and Celeb A) and two federated natural language datasets (Sent140 and Shakespeare) from the LEAF (Caldas et al., 2018) project for standardized FL research.
Dataset Splits No The paper states 'We randomly split the local datasets into 80% training set and 20% test set.' but does not explicitly mention a separate validation set or its proportion.
Hardware Specification No The paper does not specify the hardware used for running experiments (e.g., CPU, GPU models, memory, or cloud instances).
Software Dependencies No The paper mentions 'We implement the models with Py Torch (Paszke et al., 2019) and use Flower (Beutel et al., 2020) to simulate the FL server and clients.' but does not provide specific version numbers for PyTorch or Flower.
Experiment Setup Yes For all experiments, we sample 10 clients each round. We train Synthetic, FEMNIST and Celeb A for 500 rounds each. We train Sent140 for 1000 rounds due to slow convergence and Shakespeare for 50 rounds due to rapid convergence. We use batch size 10, learning rates 0.01, 0.003, 0.3, 0.8, 0.1 and µs (Fed Prox s proximal term coefficient) 1, 1, 1, 0.001, 0 for Synthetic, FEMNIST, Sent140, Shakespeare, Celeb A respectively.