Federated Fine-tuning of Large Language Models under Heterogeneous Tasks and Client Resources

Authors: Jiamu Bai, Daoyuan Chen, Bingchen Qian, Liuyi Yao, Yaliang Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Involving thousands of clients performing heterogeneous NLP tasks and client resources, our experiments validate the efficacy of Flex Lo RA, with the federated global model achieving consistently better improvement over SOTA FL methods in downstream NLP task performance across various heterogeneous distributions.
Researcher Affiliation Collaboration Jiamu Bai Pennsylvania State University jvb6867@psu.edu Daoyuan Chen Alibaba Group daoyuanchen.cdy@alibaba-inc.com Bingchen Qian Alibaba Group qianbingchen.qbc@alibaba-inc.com Liuyi Yao Alibaba Group yly287738@alibaba-inc.com Yaliang Li Alibaba Group yaliang.li@alibaba-inc.com
Pseudocode Yes We summarize the Pseudocode of Flex Lo RA in Algorithms 1 and 2. ... Algorithm 1 Flex Lo RA for Federated Learning ... Algorithm 2 Flex Lo RA Server Update
Open Source Code Yes Our code is made available at https://github.com/alibaba/Federated Scope/tree/Flex Lo RA, inviting further research and application in real-world cross-device FL for LLMs.
Open Datasets Yes We further make FL clients task-heterogeneous by utilizing the natural instruction dataset [39]. ... We use Dolly-15K dataset [9], which supports instruction tuning and includes 8 tasks in total.
Dataset Splits Yes For all the datasets we used, data for each client is partitioned into training, validation, and testing sets in a ratio of 8:1:1.
Hardware Specification Yes Our experiments are conducted on a cluster equipped with 16 NVIDIA A100 GPUs, each with 40GB or 80GB of memory.
Software Dependencies Yes All the experiments are implemented using Py Torch package with version 2.1.0 and Huggingface s Transformers package with version 4.31.0.
Experiment Setup Yes All FL experiments are conducted with a client participation rate of 0.05 in each round, and with an early stopping mechanism that terminates training if the validation loss does not improve over 3 consecutive FL rounds. ... Besides, the batch size is set as 4 via searching from a range of {2,4,16}. The maximum token length is 512. ... we grid search their learning rates from a range of {5e-2, 5e-3, 5e-4} for Fed Avg, and {5e-4, 1e-4, 5e-5, 1e-5} for Fed IT and SLo RA, both accompanied with a linear scheduler which decays from the initial learning rate to 0.