Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

FedMGP: Personalized Federated Learning with Multi-Group Text-Visual Prompts

Authors: Weihao Bo, Yanpeng Sun, Yu Wang, Xinyu Zhang, Zechao Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct comprehensive experiments to validate the dual capabilities of Fed MGP: (1) maintaining strong personalization for individual clients while achieving robust cross-client generalization, and (2) demonstrating superior performance across various heterogeneous data distributions. Our evaluation spans multiple scenarios including non-IID data partitions and Dirichlet distributions with varying concentration parameters, demonstrating Fed MGP s effectiveness in addressing the fundamental challenges of federated learning with prompt-based multimodal adaptation.
Researcher Affiliation	Collaboration	1Nanjing University of Science and Technology 2National University of Singapore 3Baidu VIS 4University of Auckland
Pseudocode	Yes	A Fed MGP ALGORITHM 23 Algorithm 1 FEDMGP: Federated Learning via Multi-Group Text-Visual Prompt Co-Learning
Open Source Code	No	The code will be released on https://github.com/weihao-bo/Fed MGP.git.
Open Datasets	Yes	Caltech101 [13] for general object classiﬁcation; Oxford Pets [38], Flowers102 [36], Food101 [4], Stanford Cars [25], and FGVC Aircraft [33] for ﬁne-grained classiﬁcation; DTD [7] for texture classiﬁcation; UCF101 [47] for action recognition; and SUN397 [54] for scene recognition. We create a pathological non-IID setting by equally splitting each dataset into base and novel classes, then assigning non-overlapping base classes to different clients. Each client s model is trained on local classes and evaluated on three test sets: local classes (personalization), base classes seen by other clients (cross-client knowledge transfer), and novel classes unseen during training (generalization to new concepts).Second, to evaluate personalization under label distribution shift, we employ CIFAR-10 and CIFAR-100 [26], partitioning data among clients using Dirichlet distribution Dir(α) with varying concentration parameters. This creates realistic heterogeneity where clients possess varying class proportions, allowing us to examine how effectively Fed MGP s multi-group prompt mechanism adapts to imbalanced class distributions.Third, to assess performance under both feature and label distribution shifts, we test Fed MGP on multi-domain datasets: Domain Net [39] with six distinct visual domains and Ofﬁce-Caltech10 [14] with four domains.
Dataset Splits	Yes	For base-to-novel generalization experiments, we set communication rounds T = 10 with 100% client participation rate, local epochs E = 2, and use 16-shot samples per class. For CIFAR-10 and CIFAR100 experiments, we simulate a realistic federated environment with Dir(α = 0.5) distribution across 100 clients, with 10% client participation rate per round, utilizing the full training dataset. All models are trained using stochastic gradient descent (SGD) with an initial learning rate of 0.001
Hardware Specification	Yes	All implementations are based on Py Torch and experiments were conducted on NVIDIA RTX 4090 (24GB) or A100 (40GB) GPUs.
Software Dependencies	No	All implementations are based on Py Torch and experiments were conducted on NVIDIA RTX 4090 (24GB) or A100 (40GB) GPUs. No specific version for PyTorch or other software dependencies is provided.
Experiment Setup	Yes	Implementation Details. To ensure fair comparison with existing methods, we establish a uniﬁed experimental framework by re-implementing all baseline approaches using their ofﬁcial code repositories under identical settings. Speciﬁcally, we adopt Vi T-B/16 [10] as the backbone for all methods. For base-to-novel generalization experiments, we set communication rounds T = 10 with 100% client participation rate, local epochs E = 2, and use 16-shot samples per class. For CIFAR-10 and CIFAR100 experiments, we simulate a realistic federated environment with Dir(α = 0.5) distribution across 100 clients, with 10% client participation rate per round, utilizing the full training dataset. All models are trained using stochastic gradient descent (SGD) with an initial learning rate of 0.001 and a single-step learning rate scheduler. All other implementation speciﬁcs, including additional hyperparameter settings, optimization strategies, and evaluation protocols, are detailed in the appendix to ensure reproducibility. For more details, please refer to Appendix C.2.