Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Global Prompt Refinement with Non-Interfering Attention Masking for One-Shot Federated Learning

Authors: Zhuang Qi, Yu Pan, Lei Meng, Sijin Zhou, Han Yu, Xiaoxiao Li, Xiangxu Meng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments conducted on ten benchmark datasets under two tasks show that GPR-NIAM outperforms eight state-of-the-art methods in both class-level and domain-level generalization.
Researcher Affiliation	Academia	1School of Software, Shandong University, China 2AIM Lab, Faculty of Engineering, Monash University, Clayton, VIC, Australia 3College of Computing and Data Science, Nanyang Technological University, Singapore 4Department of Electrical and Computer Engineering, University of British Columbia, Canada 5Vector Institute, Canada EMAIL, EMAIL, EMAIL, EMAIL,EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 GPR-NIAM
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The code will be uploaded as supplementary material.
Open Datasets	Yes	Following existing works [17, 52], experiments are conducted on CIFAR10 [53], Oxford Pets [54], Caltech101 [55], DTD [56], FGVCAircraft [57], Flowers102 [58], Stanford Cars [59], and UCF101 [60] to evaluate base-to-base/novel generalization, and on Office-Home [61] and Domain Net [62] to assess leave-one-domain-out generalization.
Dataset Splits	Yes	Datasets Following existing works [17, 52], experiments are conducted on CIFAR10 [53], Oxford Pets [54], Caltech101 [55], DTD [56], FGVCAircraft [57], Flowers102 [58], Stanford Cars [59], and UCF101 [60] to evaluate base-to-base/novel generalization, and on Office-Home [61] and Domain Net [62] to assess leave-one-domain-out generalization. Their statistics are shown in Table 5. Table 5: Statistics and federated settings of the datasets used in our experiments. Dataset Classes Train Test Domains Federated Settings Base Novel Clients Heterogeneity CIFAR10 10 25,000 5,000 5,000 1 5/10/20 0.1/0.5 ... (table continues with train/test numbers for all datasets)
Hardware Specification	Yes	And each client has one NVIDIA RTX 3090 with 24 GB GPU for training.
Software Dependencies	No	Implementation Details In all experiments, we set local training and global refinement epochs to 10, using the SGD optimizer with a learning rate of 0.001, a weight decay of 0.001, and a batch size of 32. The number of communication rounds is 1.
Experiment Setup	Yes	Implementation Details In all experiments, we set local training and global refinement epochs to 10, using the SGD optimizer with a learning rate of 0.001, a weight decay of 0.001, and a batch size of 32. The number of communication rounds is 1. We simulate 50 clients for CIFAR-10 and 10 clients for other datasets in base-to-base/novel generalization, while using 9 clients for Office-Home and 15 for Domain Net. To simulate non-IID data, we adopt a Dirichlet distribution with parameter β = 0.5. Each client generates n {5, 10, 20} visual prototypes per class, and the reweighting parameter λ is selected from {0.2, 0.5, 0.7, 1.0}.