DoCoFL: Downlink Compression for Cross-Device Federated Learning

Authors: Ron Dorfman, Shay Vargaftik, Yaniv Ben-Itzhak, Kfir Yehuda Levy

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive evaluation, we show that Do Co FL offers significant bi-directional bandwidth reduction while achieving competitive accuracy to that of a baseline without any compression. We cover a wide range of use cases that include two image classification and two language processing tasks with different configurations and data partitioning, as shortly summarized in Table 2 and further detailed in Appendix F.
Researcher Affiliation Collaboration 1VMware Research 2Viterby Faculty of Electrical and Computer Engineering, Technion, Haifa, Israel. Correspondence to: Ron Dorfman <rdorfman@campus.technion.ac.il>.
Pseudocode Yes Algorithm 1 Do Co FL Parameter Server, Algorithm 2 Do Co FL Client i, Algorithm 3 Meta-Algorithm (generalization of Do Co FL), Algorithm 4 Entropy-Constrained Uniform Quantization (ECUQ)
Open Source Code No The paper does not contain any explicit statements about releasing source code or provide a link to a code repository.
Open Datasets Yes We use the CIFAR-100 and EMNIST datasets. For CIFAR-100 (Krizhevsky et al., 2009), the data distribution among the clients is i.i.d. For EMNIST (Cohen et al., 2017)... For language processing, we perform a sentiment analysis task on the Amazon Reviews dataset (Zhang et al., 2015) with i.i.d data partitioning; and a next-character prediction task on the Shakespeare dataset (Mc Mahan et al., 2017)...
Dataset Splits No The paper mentions using train and validation data (e.g., 'reduced the amount of train and validation data for each speaker', 'best validation accuracy', 'validation accuracy throughout training'), but it does not specify explicit percentages or counts for the dataset splits, nor does it refer to a standard split with a citation for the overall dataset partitioning (only client data distribution).
Hardware Specification No The paper mentions 'edge devices' and 'low-resourced clients' in the context of the problem, but it does not provide any specific hardware details such as CPU/GPU models, memory, or cloud instance types used for running its experiments.
Software Dependencies No The paper states 'We implemented Do Co FL in Py Torch (Paszke et al., 2019)', but does not provide a specific version number for PyTorch or any other software dependency.
Experiment Setup Yes In all experiments, the PS uses Momentum SGD as optimizer with a momentum of 0.9 and L2 regularization (i.e., weight decay) with parameter 10 5. The clients, on the other hand, use vanilla SGD for all tasks but Amazon Reviews, for which Adam provided better results. In Table 4 we report the hyperparameters used in our experiments. Table 4. Hyperparameters for our experiments. Task Batch size Client optimizer Client lr Server lr EMNIST 64 SGD 0.05 1 CIFAR-100 128 SGD 0.05 1 Amazon Review 64 Adam 0.005 0.1 Shakespeare 4 SGD 0.5 1