Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization

Authors: Jianyu Wang, Qinghua Liu, Hao Liang, Gauri Joshi, H. Vincent Poor

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 Experimental Results. We evaluate all algorithms on two setups with non-IID data partitioning: (1) Logistic Regression on a Synthetic Federated Dataset: The dataset Synthetic(1, 1) is originally constructed in [38]. (2) DNN trained on a Non-IID partitioned CIFAR-10 dataset: We train a VGG-11 [52] network on the CIFAR10 dataset [53], which is partitioned across 16 clients using a Dirichlet distribution Dir16(0.1), as done in [54]. The original CIFAR-10 test set (without partitioning) is used to evaluate the generalization performance of the trained global model.
Researcher Affiliation Academia Jianyu Wang Carnegie Mellon University Pittsburgh, PA 15213 jianyuw1@andrew.cmu.edu Qinghua Liu Princeton University Princeton, NJ 08544 qinghual@princeton.edu Hao Liang Carnegie Mellon University Pittsburgh, PA 15213 hliang2@andrew.cmu.edu Gauri Joshi Carnegie Mellon University Pittsburgh, PA 15213 gaurij@andrew.cmu.edu H. Vincent Poor Princeton University Princeton, NJ 08544 poor@princeton.edu
Pseudocode Yes We provide its pseudo-code in the Appendix.
Open Source Code Yes Our code is available at: https://github.com/JYWa/FedNova.
Open Datasets Yes The dataset Synthetic(1, 1) is originally constructed in [38]. ... We train a VGG-11 [52] network on the CIFAR10 dataset [53], which is partitioned across 16 clients using a Dirichlet distribution Dir16(0.1), as done in [54].
Dataset Splits No The paper mentions training on CIFAR-10 and evaluating on the CIFAR-10 test set, but does not explicitly state or describe a validation set or how data was split for training, validation, and testing.
Hardware Specification No The paper does not provide specific details on the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes The local learning rate η is decayed by a constant factor after finishing 50% and 75% of the communication rounds. The initial value of η is tuned separately for Fed Avg with different local solvers. On CIFAR-10, we run each experiment with 3 random seeds and report the average and standard deviation. In Fed Prox, we set µ = 1, the best value reported in [38]. Clients perform GD with η = 0.05, which is decayed by a factor of 5 at rounds 600 and 900. Middle: Only C = 0.3 fraction of clients are randomly selected per round to perform Ei = 5 local epochs. Right: Only C = 0.3 fraction of clients are randomly selected per round to perform random and time-varying local epochs Ei(t) U(1, 5).