Federated Continual Learning with Weighted Inter-client Transfer

Authors: Jaehong Yoon, Wonyong Jeong, Giwoong Lee, Eunho Yang, Sung Ju Hwang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our Fed We IT against existing federated learning and continual learning methods under varying degrees of task similarity across clients, and our model significantly outperforms them with a large reduction in the communication cost.
Researcher Affiliation Collaboration 1Korea Advanced Institute of Science and Technology (KAIST), South Korea 2AITRICS, South Korea.
Pseudocode Yes Algorithm 1 Federated Weighted Inter-client Transfer
Open Source Code Yes Code is available at https://github.com/wyjeong/Fed We IT.
Open Datasets Yes We validate our Fed We IT under different configurations of task sequences against baselines which are namely Overlapped-CIFAR-100 and Non IID-50. ... MNIST (Le Cun et al., 1998), CIFAR-10/-100 (Krizhevsky & Hinton, 2009), SVHN (Netzer et al., 2011), Fashion MNIST (Xiao et al., 2017), Not-MNIST (Bulatov, 2011), Face Scrub (Ng & Winkler, 2014), and Traffic Signs (Stallkamp et al., 2011).
Dataset Splits Yes Table A.5. Dataset Details of Non IID-50 Task. We provide dataset details of Non IID-50 dataset, including 8 heterogeneous datasets, number of sub-tasks, classes per sub-task, and instances of train, valid, and test sets.
Hardware Specification No The paper describes the network architectures used (Le Net, ResNet-18) but does not provide specific hardware details such as GPU/CPU models, processors, or memory used for running the experiments.
Software Dependencies No The paper mentions using an Adam optimizer but does not specify software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions, CUDA).
Experiment Setup Yes We use an Adam optimizer with adaptive learning rate decay, which decays the learning rate by a factor of 3 for every 5 epochs with no consecutive decrease in the validation loss. We stop training in advance and start learning the next task (if available) when the learning rate reaches ρ. The experiment for Le Net with 5 clients, we initialize by 1e 3 1/3 at the beginning of each new task and ρ = 1e 7. Mini-batch size is 100, the rounds per task is 20, an the epoch per round is 1. The setting for Res Net-18 is identical, excluding the initial learning rate, 1e 4. In the case of experiments with 20 and 100 clients, we set the same settings except reducing minibatch size from 100 to 10 with an initial learning rate 1e 4. We use client fraction 0.25 and 0.05, respectively, at each communication round.