Flash: Concept Drift Adaptation in Federated Learning
Authors: Kunjal Panchal, Sunav Choudhary, Subrata Mitra, Koyel Mukherjee, Somdeb Sarkhel, Saayan Mitra, Hui Guan
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We theoretically prove that FLASH matches the convergence rate of state-of-the-art adaptive optimizers and further empirically evaluate the efficacy of FLASH on a variety of FL benchmarks using different concept drift settings. |
| Researcher Affiliation | Collaboration | 1University of Massachusetts, Amherst, USA 2Adobe Research, Bangalore, India 3Adobe Research, San Jose, USA. |
| Pseudocode | Yes | Algorithm 1 provides the pseudo-code for FLASH. |
| Open Source Code | Yes | The implementation is available on 1. ^1Source Code |
| Open Datasets | Yes | We have a convex task: Classification of Synthetic data (Li et al., 2020)... EMNIST (Cohen et al., 2017) Image Classification... CIFAR10/100 (Krizhevsky et al., 2009)... |
| Dataset Splits | Yes | As a stopping criterion, we use decrement in the validation loss value (see Line 7) to indicate when a local model w(r) c reaches its steady state. If the validation loss stops to decrease by a threshold γ/e, where γ is a threshold hyperparameter and e is the current epoch count, we stop training for the client. |
| Hardware Specification | Yes | We use an NVidia 2080ti GPU to run all the experiments with 3 runs for each. |
| Software Dependencies | No | We use Flower (Beutel et al., 2020) library to implement FLASH and all its baselines. The paper mentions using TensorFlow Federated datasets but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Hyperparameter details are given in Appendix A. ...We have trained each of our baselines and FLASH for R = 1500 rounds, with the batch size of N = 20 instances, 10 clients per round. All the experiments have ran for E = 10... The default learning rates for all the experiments is ηℓ= 0.05 and ηg = 1.00. Although SCAFFOLD and FEDDYN required ηℓ= 0.03. For both FEDPROX and FEDDYN, λ was assigned 0.001. APFL has α = 0.25. And DITTO has λ = 0.1 and client learning rate of ηℓ= 0.01. α in FEDDC has been assigned to 0.5. While ρ in FEDNOVA has been assigned to 0.8. FLASH has γ = 0.04. |