Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

LLM at Network Edge: A Layer-wise Efficient Federated Fine-tuning Approach

Authors: Jinglong Shen, Nan Cheng, Wenchao Xu, Haozhao Wang, Yifan guo, Jiajie Xu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on diverse datasets demonstrate that LEFF attains superior computational efficiency and model performance compared to existing federated fine-tuning methods, particularly under heterogeneous conditions.
Researcher Affiliation	Academia	1School of Telecommunications Engineering, Xidian University 2Department of Computing, The Hong Kong Polytechnic University 3School of Computer Science and Technology, Huazhong University of Science and Technology EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper includes figures (e.g., Figure 1: Overview of the LEFF framework, Figure 3: Distillation process in LEFF) to illustrate its methodology and architecture, but it does not present any explicitly labeled pseudocode blocks or algorithms in a structured, step-by-step format.
Open Source Code	No	The paper does not contain an explicit statement about releasing its source code for the methodology described, nor does it provide a link to a code repository. The NeurIPS Paper Checklist in the document confirms this under '5. Open access to data and code': 'Answer: [No] Justification: The paper uses public datasets (GLUE, E2E NLG, Web NLG, WNLI) and existing pre-trained models (GPT-2, De BERTa V3), which are cited. However, it does not explicitly state that the code for the proposed LEFF method or the experimental scripts are publicly available, nor does it provide a link or instructions for accessing them.'
Open Datasets	Yes	We evaluate on the GLUE benchmark Wang et al. (2019) (e.g., Co LA, MRPC, MNLI) for NLU tasks and the E2E NLG Challenge Novikova et al. (2017) for NLG from structured meaning representations. Experiments were conducted for 20 communication rounds on the public datasets Web NLG Gardent et al. (2017) and WNLI Wang et al. (2019).
Dataset Splits	No	To simulate heterogeneous (non-IID) data distributions, client data was partitioned using a Dirichlet distribution with a concentration parameter (α) ranging from 0.05 to 50.0. Our FL simulations involved a varying number of clients, each performing one local training epoch per communication round.
Hardware Specification	Yes	All experiments were performed on a system with eight NVIDIA H100 GPUs.
Software Dependencies	No	The paper mentions using the Adam W optimizer Loshchilov, Hutter (2019) and refers to pre-trained models like GPT-2 Medium Radford et al. (2019) and De BERTa V3 Base He et al. (2021), but it does not specify version numbers for any software libraries, programming languages, or development environments used (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup	Yes	Experiments were conducted for 20 communication rounds on the public datasets Web NLG Gardent et al. (2017) and WNLI Wang et al. (2019). To simulate heterogeneous (non-IID) data distributions, client data was partitioned using a Dirichlet distribution with a concentration parameter (α) ranging from 0.05 to 50.0. Our FL simulations involved a varying number of clients, each performing one local training epoch per communication round. During local training, clients fine-tuned their local models using the Adam W optimizer Loshchilov, Hutter (2019) with a learning rate of 1 × 10−5.