reproducibilityindex.ai

Federated Fine-tuning of Large Language Models under Heterogeneous Tasks and Client Resources

Authors: Jiamu Bai, Daoyuan Chen, Bingchen Qian, Liuyi Yao, Yaliang Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Involving thousands of clients performing heterogeneous NLP tasks and client resources, our experiments validate the efficacy of Flex Lo RA, with the federated global model achieving consistently better improvement over SOTA FL methods in downstream NLP task performance across various heterogeneous distributions.
Researcher Affiliation	Collaboration	Jiamu Bai Pennsylvania State University jvb6867@psu.edu Daoyuan Chen Alibaba Group daoyuanchen.cdy@alibaba-inc.com Bingchen Qian Alibaba Group qianbingchen.qbc@alibaba-inc.com Liuyi Yao Alibaba Group yly287738@alibaba-inc.com Yaliang Li Alibaba Group yaliang.li@alibaba-inc.com
Pseudocode	Yes	We summarize the Pseudocode of Flex Lo RA in Algorithms 1 and 2. ... Algorithm 1 Flex Lo RA for Federated Learning ... Algorithm 2 Flex Lo RA Server Update
Open Source Code	Yes	Our code is made available at https://github.com/alibaba/Federated Scope/tree/Flex Lo RA, inviting further research and application in real-world cross-device FL for LLMs.
Open Datasets	Yes	We further make FL clients task-heterogeneous by utilizing the natural instruction dataset [39]. ... We use Dolly-15K dataset [9], which supports instruction tuning and includes 8 tasks in total.
Dataset Splits	Yes	For all the datasets we used, data for each client is partitioned into training, validation, and testing sets in a ratio of 8:1:1.
Hardware Specification	Yes	Our experiments are conducted on a cluster equipped with 16 NVIDIA A100 GPUs, each with 40GB or 80GB of memory.
Software Dependencies	Yes	All the experiments are implemented using Py Torch package with version 2.1.0 and Huggingface s Transformers package with version 4.31.0.
Experiment Setup	Yes	All FL experiments are conducted with a client participation rate of 0.05 in each round, and with an early stopping mechanism that terminates training if the validation loss does not improve over 3 consecutive FL rounds. ... Besides, the batch size is set as 4 via searching from a range of {2,4,16}. The maximum token length is 512. ... we grid search their learning rates from a range of {5e-2, 5e-3, 5e-4} for Fed Avg, and {5e-4, 1e-4, 5e-5, 1e-5} for Fed IT and SLo RA, both accompanied with a linear scheduler which decays from the initial learning rate to 0.