Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

FedGPS: Statistical Rectification Against Data Heterogeneity in Federated Learning

Authors: Zhiqin Yang, Yonggang Zhang, Chenxin Li, Yiu-ming Cheung, Bo Han, Yixuan Yuan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that Fed GPS outperforms state-of-the-art methods across diverse heterogeneity scenarios, validating its effectiveness and robustness. The results presented in Fig. 1, Tabs. 1, 2, 3 and 4 show that most methods exhibit limited robustness. Extensive experiments conducted on three benchmark datasets confirm the effectiveness of Fed GPS, showcasing its superior performance across diverse scenarios.
Researcher Affiliation	Academia	1The Chinese University of Hong Kong 2Hong Kong Baptist University 3The Hong Kong University of Science and Technology
Pseudocode	Yes	C.3 Process and Pseudocode of Algorithm In the Algorithm section, we elaborate on the pseudocode workflow of Fed GPS in Algorithm 1. Consistent with other federated learning (FL) frameworks, we predefine the number of communication Table 8: Top-1 accuracy of baselines and our method Fed GPS with 5 different heterogeneous scenarios on SVHN, heterogeneity degree α = 0.1, local epochs E = 1 and total client number K = 100. Algorithm 1 Pseudo-code of Fed GPS
Open Source Code	Yes	The code is available at: https://github.com/CUHK-AIM-Group/Fed GPS.
Open Datasets	Yes	Following [3, 14, 57], we evaluate our method on three standard datasets: CIFAR-10, CIFAR-100 [58], and SVHN [59].
Dataset Splits	Yes	To simulate a heterogeneous data distribution across clients, we employ the Dirichlet partitioning method, a common approach in recent FL works [8, 57, 51]. This method draws client data proportions q from a Dirichlet distribution, q Dir(αp), where α is the concentration parameter that controls the degree of heterogeneity. We use α = 0.1, but vary the random seed to generate multiple distinct heterogeneous data distributions. Examples of these distributions are shown in Fig. 2(a) and 2(b). We simulate cross-silo scenarios using 10 clients and cross-device scenarios using 100 clients. We set the sampling rate λs as 50% for cross-silos and 10% for cross-devices scenario. We set local epochs E = 1 (results for different local epochs are shown in the Appendix D.3).
Hardware Specification	No	The paper does not explicitly specify hardware details like GPU/CPU models or specific compute resources used for experiments in the main text or supplementary materials.
Software Dependencies	No	The paper mentions using 'SGD optimizer' but does not specify any software libraries or frameworks with their version numbers (e.g., PyTorch 1.x, TensorFlow 2.x).
Experiment Setup	Yes	To ensure a fair and direct comparison, all methods were evaluated under identical conditions, including the same data partitioning, sampling rate, local epochs, and communication rounds. We use the SGD optimizer with 0.01 learning rate and 0.9 momentum, 1e-5 weight decay (also denoted as λ3). Among the hyperparameters, λ1 and λ2 were both set to 0.1, and λg is fixed at 0.5 for the main experiments (Details can be seen in the Appendix C). Table 7 in Appendix C.2 provides a detailed list of hyperparameters for all compared baselines and Fed GPS, including learning rate, momentum, weight decay, Nesterov, and other method-specific parameters.