Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Class-wise Balancing Data Replay for Federated Class-Incremental Learning

Authors: Zhuang Qi, Ying-Peng Tang, Lei Meng, Han Yu, Xiaoxiao Li, Xiangxu Meng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments were conducted on three datasets with different levels of heterogeneity, including performance comparisons, ablation studies, in-depth analysis, and case studies. The results demonstrate that Fed CBDR effectively balances the number of replayed samples across classes and alleviates the long-tail problem. Compared to six state-of-the-art existing methods, Fed CBDR achieves a 2%-15% Top-1 accuracy improvement.
Researcher Affiliation Academia 1School of Software, Shandong University, China 2College of Computing and Data Science, Nanyang Technological University, Singapore 3Department of Electrical and Computer Engineering, University of British Columbia, Canada 4 Vector Institute, Canada EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 FEDCBDR
Open Source Code Yes The code will be made available as supplementary material.
Open Datasets Yes Following existing studies [27, 30], we conducted all experiments on three commonly used datasets, including CIFAR10 [53, 54], CIFAR100 [53, 54] and Tiny Image Net [55] to validate the effectiveness of the Fed CBDR.
Dataset Splits Yes We simulate heterogeneous data distributions across clients using the Dirichlet distribution with parameters β = {0.1, 0.5, 1.0}, where smaller values of β correspond to higher level of data heterogeneity. The statistical details are presented in the Table 1. ... The number of stored samples per task varies by dataset and split setting: for CIFAR10, 450 samples are stored under 3-task splits and 300 under 5-task splits; for CIFAR100, 1,000 samples are used for 5-task splits and 500 for 10-task splits; for Tiny Image Net, 2,000 samples are stored for 10-task splits and 1,000 for 20-task splits.
Hardware Specification Yes And training on each client is performed using an NVIDIA RTX 3090 GPU (24 GB).
Software Dependencies No The paper mentions 'Res Net-18 as the backbone' and 'SGD optimizer' but does not provide specific version numbers for any software libraries, frameworks, or programming languages.
Experiment Setup Yes In the experiments, the number of clients is fixed at K = 5, with each client running local epochs E = 2 per round, using a batch size B = 128. For all datasets, we adopt Res Net-18 as the backbone, with the classifier s output dimension dynamically updated as tasks progress and conduct T = 100 communication rounds per task. The SGD optimizer is employed with a learning rate of 0.01 and a weight decay of 1 10 5. The number of stored samples per task varies by dataset and split setting: for CIFAR10, 450 samples are stored under 3-task splits and 300 under 5-task splits; for CIFAR100, 1,000 samples are used for 5-task splits and 500 for 10-task splits; for Tiny Image Net, 2,000 samples are stored for 10-task splits and 1,000 for 20-task splits. For the temperature and weighted parameters, we select τold {0.8, 0.9} and wold {1.1, 1.2, 1.3, 1.4} for previous tasks, while τnew {1.1, 1.2} and wnew {0.7, 0.8, 0.9} are used for newly arrived tasks.