Accurate Forgetting for Heterogeneous Federated Continual Learning

Authors: Abudukelimu Wuerkaixi, Sen Cui, Jingfeng Zhang, Kunda Yan, Bo Han, Gang Niu, Lei Fang, Changshui Zhang, Masashi Sugiyama

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments affirm the superiority of our method over baselines.
Researcher Affiliation Collaboration 1 Institute for Artificial Intelligence, Tsinghua University (THUAI) Beijing National Research Center for Information Science and Technology (BNRist) Department of Automation, Tsinghua University, Beijing, P.R.China 2 The University of Auckland 3 RIKEN 4 Hong Kong Baptist University 5 Data Canvas Technology Co., Ltd. 6 The University of Tokyo
Pseudocode Yes The algorithm of our method is detailed in Algorighm 1.
Open Source Code Yes Code is at: https://github.com/zaocan666/AF-FCL.
Open Datasets Yes For the EMNIST-based dataset containing 26 classes of handwritten letter images (Cohen et al., 2017), we set the following two settings with N=8, T=6, C=2. 1) EMNIST-LTP: in LTP setting, we randomly sampled classes from the entire dataset for each client. 2) EMNIST-shuffle: in conventional shuffle setting, the task sets are consistent across all clients, while arranged in different orders. 3) CIFAR100: We randomly sample 20 classes among 100 classes of CIFAR100 (Krizhevsky et al., 2009) as a task for each of the 10 clients, and there are 4 tasks for each client (N = 10, T = 4, C = 20). 4) MNIST-SVHN-F: We set 10 clients with this mixed dataset. Each client contains 6 tasks, and each task has 3 classes.
Dataset Splits No The paper states that for EMNIST-noisy, 'After learning sequentially on all tasks, we evaluate the final three tasks, which do not contain any noisy labels,' but it does not provide general or explicit train/validation/test dataset splits (e.g., percentages, sample counts, or a description of a distinct validation set) for all experiments or for model tuning purposes.
Hardware Specification Yes In the experiments, we conduct all methods on a local Linux server that has two physical CPU chips (Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz) and 32 logical kernels. All methods are implemented using Pytorch framework and all models are trained on Ge Force RTX 2080 Ti GPUs.
Software Dependencies No The paper states 'All methods are implemented using Pytorch framework' but does not specify the version number of PyTorch or any other software dependencies.
Experiment Setup Yes For all experiments except for CIFAR100, a learning rate of 1e-4 is utilized, with a global communication round of 60, and local iteration of 100. We set learning rate as 1e-3, global communication round as 40, and local iteration as 400 for CIFAR100. Consistent with prior research (Yoon et al., 2021a; Qi et al., 2023), all clients participate in each communication round. For training, a mini-batch size of 64 is adopted. The Adam optimizer is employed for training all models.