Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Adaptive LoRA Experts Allocation and Selection for Federated Fine-Tuning

Authors: Lei Wang, Jieming Bian, Letian Zhang, Jie Xu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments on diverse benchmark datasets demonstrate that Fed LEASE significantly outperforms existing federated fine-tuning approaches in heterogeneous client settings while maintaining communication efficiency.
Researcher Affiliation	Academia	Lei Wang University of Florida Gainesville, FL 32611 EMAIL Jieming Bian University of Florida Gainesville, FL 32611 EMAIL Letian Zhang Middle Tennessee State University Murfreesboro, TN 37132 EMAIL Jie Xu University of Florida Gainesville, FL 32611 EMAIL
Pseudocode	Yes	A detailed Algorithm 1 can be found in Section A. Algorithm 1 Fed LEASE: Federated Low-Rank Expert Learning
Open Source Code	Yes	We include our proposed method s code in the supplemental material. The official implement can be found at https://github.com/lei-wang-link/Fed LEASE.
Open Datasets	Yes	For the NLU task, we use Ro BERTa [30] as the pre-trained model and fine-tune it on the GLUE benchmark [40]. For the NLG task, we adopt LLa MA2 [39] as the pre-trained model and fine-tune it on the FLAN dataset [10].
Dataset Splits	Yes	For the NLU task, we consider 16 clients in total, with four clients assigned to each of the four GLUE datasets. Each client s data is randomly partitioned from the corresponding full dataset. ... For NLG tasks, a total of 8 clients are considered, with each dataset assigned to two clients. Each client has 600 training samples and 200 test samples.
Hardware Specification	Yes	As shown in Table 15, we measured the clustering time (3.11 seconds on Intel Xeon Platinum 8570 CPU), which is significantly shorter than the total training time (193.49 seconds with local training on the NVIDIA B200 GPU)
Software Dependencies	No	For the NLU task, we use Ro BERTa [30] as the pre-trained model and fine-tune it on the GLUE benchmark [40]. For the NLG task, we adopt LLa MA2 [39] as the pre-trained model and fine-tune it on the FLAN dataset [10]. Ro BERTa Large (355M) [30] (24 transformer layers) from Hugging Face is used as the base model. Adam W is adopted as the optimizer for all methods... The paper does not specify versions for Python, PyTorch, or other libraries.
Experiment Setup	Yes	For the NLU task...Adam W is adopted as the optimizer for all methods, with a batch size of 128, local epochs set to 2, and a total of 25 communication rounds... Lo RA is applied to the query and value projections in the attention layers, and the classification head is frozen after initialization. For our method, the upper bound of experts Mmax is set to 8 and the Lo RA rank to 4. Baselines are configured to ensure comparable computational workloads. Learning rates are selected via grid search from η {1E 4, 3E 4, 5E 4, 1E 3, 3E 3, 5E 3}. ... For NLG tasks... All methods use Adam W as the optimizer, with a batch size of 8, local epochs set to 2, 10 communication rounds and the upper bound of experts Mmax is set to 8. Lo RA is applied to the query and value matrices in the attention layers, with a Lo RA rank of 8. Learning rates are selected via grid search from η {1E 4, 3E 4, 1E 3, 3E 3, 1E 2}.