reproducibilityindex.ai

Continual Federated Learning Based on Knowledge Distillation

Authors: Yuhang Ma, Zhongle Xie, Jue Wang, Ke Chen, Lidan Shou

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate two scenarios of CFL by varying the data distribution and adding categories on text and image classiﬁcation datasets. CFe D outperforms existing FL methods in overcoming forgetting without sacriﬁcing the ability to learn new tasks.
Researcher Affiliation	Academia	Yuhang Ma1 , Zhongle Xie1 , Jue Wang1 , Ke Chen1 and Lidan Shou1,2 1College of Computer Science and Technology, Zhejiang University 2State Key Laboratory of CAD&CG, Zhejiang University {myh0032, xiezl, zjuwangjue, chenk, should}@zju.edu.cn
Pseudocode	No	The paper includes diagrams but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code and datasets are publicly available at https://github. com/lianziqt/CFe D.
Open Datasets	Yes	THUCNews [Li et al., 2006]... Sogou CS [Sogou Labs, 2012]... NLPIR Weibo Corpus [NLPIR, 2017]... CIFAR10[Krizhevsky, 2009]... CIFAR-100[Krizhevsky, 2009]... Caltech-256[Grifﬁn et al., 2007]
Dataset Splits	Yes	For each task, we select 70% of data as the training set, 10% as the validation set and the rest as the test set.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU models, CPU types, or cloud computing instances.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., Python 3.x, PyTorch 1.x) needed to replicate the experiment.
Experiment Setup	Yes	Each task trains the model for R = 20 rounds. For the local updating in each client, the learning epoch is 10 in Domain-IL or 40 in Class-IL. Unless otherwise stated, the constraint factor λ of the EWC method is set to 100000. The temperature of distillation is set to 2 as default. For the conﬁguration of FL, we assume that there are 100 clients, and only random 10% clients are sampled to participate in each training round. The training dataset and surrogate dataset are both divided into 200 shards randomly (IID) or sorted by the category (Non-IID). In each experiment, every client selects two shards of data on each task as the local dataset and also two shards of the surrogate dataset as the local surrogate. In particular, the server also selects two shards for server distillation in the Non-IID distribution. All above selections are conducted randomly. For each task, we select 70% of data as the training set, 10% as the validation set and the rest as the test set.