Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Class-wise Balancing Data Replay for Federated Class-Incremental Learning

Authors: Zhuang Qi, Ying-Peng Tang, Lei Meng, Han Yu, Xiaoxiao Li, Xiangxu Meng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments were conducted on three datasets with different levels of heterogeneity, including performance comparisons, ablation studies, in-depth analysis, and case studies. The results demonstrate that Fed CBDR effectively balances the number of replayed samples across classes and alleviates the long-tail problem. Compared to six state-of-the-art existing methods, Fed CBDR achieves a 2%-15% Top-1 accuracy improvement.
Researcher Affiliation	Academia	1School of Software, Shandong University, China 2College of Computing and Data Science, Nanyang Technological University, Singapore 3Department of Electrical and Computer Engineering, University of British Columbia, Canada 4 Vector Institute, Canada EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 FEDCBDR
Open Source Code	Yes	The code will be made available as supplementary material.
Open Datasets	Yes	Following existing studies [27, 30], we conducted all experiments on three commonly used datasets, including CIFAR10 [53, 54], CIFAR100 [53, 54] and Tiny Image Net [55] to validate the effectiveness of the Fed CBDR.
Dataset Splits	Yes	We simulate heterogeneous data distributions across clients using the Dirichlet distribution with parameters β = {0.1, 0.5, 1.0}, where smaller values of β correspond to higher level of data heterogeneity. The statistical details are presented in the Table 1. ... The number of stored samples per task varies by dataset and split setting: for CIFAR10, 450 samples are stored under 3-task splits and 300 under 5-task splits; for CIFAR100, 1,000 samples are used for 5-task splits and 500 for 10-task splits; for Tiny Image Net, 2,000 samples are stored for 10-task splits and 1,000 for 20-task splits.
Hardware Specification	Yes	And training on each client is performed using an NVIDIA RTX 3090 GPU (24 GB).
Software Dependencies	No	The paper mentions 'Res Net-18 as the backbone' and 'SGD optimizer' but does not provide specific version numbers for any software libraries, frameworks, or programming languages.
Experiment Setup	Yes	In the experiments, the number of clients is fixed at K = 5, with each client running local epochs E = 2 per round, using a batch size B = 128. For all datasets, we adopt Res Net-18 as the backbone, with the classifier s output dimension dynamically updated as tasks progress and conduct T = 100 communication rounds per task. The SGD optimizer is employed with a learning rate of 0.01 and a weight decay of 1 10 5. The number of stored samples per task varies by dataset and split setting: for CIFAR10, 450 samples are stored under 3-task splits and 300 under 5-task splits; for CIFAR100, 1,000 samples are used for 5-task splits and 500 for 10-task splits; for Tiny Image Net, 2,000 samples are stored for 10-task splits and 1,000 for 20-task splits. For the temperature and weighted parameters, we select τold {0.8, 0.9} and wold {1.1, 1.2, 1.3, 1.4} for previous tasks, while τnew {1.1, 1.2} and wnew {0.7, 0.8, 0.9} are used for newly arrived tasks.