Continual Federated Learning Based on Knowledge Distillation

Authors: Yuhang Ma, Zhongle Xie, Jue Wang, Ke Chen, Lidan Shou

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate two scenarios of CFL by varying the data distribution and adding categories on text and image classification datasets. CFe D outperforms existing FL methods in overcoming forgetting without sacrificing the ability to learn new tasks.
Researcher Affiliation Academia Yuhang Ma1 , Zhongle Xie1 , Jue Wang1 , Ke Chen1 and Lidan Shou1,2 1College of Computer Science and Technology, Zhejiang University 2State Key Laboratory of CAD&CG, Zhejiang University {myh0032, xiezl, zjuwangjue, chenk, should}@zju.edu.cn
Pseudocode No The paper includes diagrams but does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code and datasets are publicly available at https://github. com/lianziqt/CFe D.
Open Datasets Yes THUCNews [Li et al., 2006]... Sogou CS [Sogou Labs, 2012]... NLPIR Weibo Corpus [NLPIR, 2017]... CIFAR10[Krizhevsky, 2009]... CIFAR-100[Krizhevsky, 2009]... Caltech-256[Griffin et al., 2007]
Dataset Splits Yes For each task, we select 70% of data as the training set, 10% as the validation set and the rest as the test set.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU models, CPU types, or cloud computing instances.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., Python 3.x, PyTorch 1.x) needed to replicate the experiment.
Experiment Setup Yes Each task trains the model for R = 20 rounds. For the local updating in each client, the learning epoch is 10 in Domain-IL or 40 in Class-IL. Unless otherwise stated, the constraint factor λ of the EWC method is set to 100000. The temperature of distillation is set to 2 as default. For the configuration of FL, we assume that there are 100 clients, and only random 10% clients are sampled to participate in each training round. The training dataset and surrogate dataset are both divided into 200 shards randomly (IID) or sorted by the category (Non-IID). In each experiment, every client selects two shards of data on each task as the local dataset and also two shards of the surrogate dataset as the local surrogate. In particular, the server also selects two shards for server distillation in the Non-IID distribution. All above selections are conducted randomly. For each task, we select 70% of data as the training set, 10% as the validation set and the rest as the test set.