Continual Federated Learning Based on Knowledge Distillation
Authors: Yuhang Ma, Zhongle Xie, Jue Wang, Ke Chen, Lidan Shou
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate two scenarios of CFL by varying the data distribution and adding categories on text and image classification datasets. CFe D outperforms existing FL methods in overcoming forgetting without sacrificing the ability to learn new tasks. |
| Researcher Affiliation | Academia | Yuhang Ma1 , Zhongle Xie1 , Jue Wang1 , Ke Chen1 and Lidan Shou1,2 1College of Computer Science and Technology, Zhejiang University 2State Key Laboratory of CAD&CG, Zhejiang University {myh0032, xiezl, zjuwangjue, chenk, should}@zju.edu.cn |
| Pseudocode | No | The paper includes diagrams but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and datasets are publicly available at https://github. com/lianziqt/CFe D. |
| Open Datasets | Yes | THUCNews [Li et al., 2006]... Sogou CS [Sogou Labs, 2012]... NLPIR Weibo Corpus [NLPIR, 2017]... CIFAR10[Krizhevsky, 2009]... CIFAR-100[Krizhevsky, 2009]... Caltech-256[Griffin et al., 2007] |
| Dataset Splits | Yes | For each task, we select 70% of data as the training set, 10% as the validation set and the rest as the test set. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU models, CPU types, or cloud computing instances. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., Python 3.x, PyTorch 1.x) needed to replicate the experiment. |
| Experiment Setup | Yes | Each task trains the model for R = 20 rounds. For the local updating in each client, the learning epoch is 10 in Domain-IL or 40 in Class-IL. Unless otherwise stated, the constraint factor λ of the EWC method is set to 100000. The temperature of distillation is set to 2 as default. For the configuration of FL, we assume that there are 100 clients, and only random 10% clients are sampled to participate in each training round. The training dataset and surrogate dataset are both divided into 200 shards randomly (IID) or sorted by the category (Non-IID). In each experiment, every client selects two shards of data on each task as the local dataset and also two shards of the surrogate dataset as the local surrogate. In particular, the server also selects two shards for server distillation in the Non-IID distribution. All above selections are conducted randomly. For each task, we select 70% of data as the training set, 10% as the validation set and the rest as the test set. |