Dual-Personalizing Adapter for Federated Foundation Models

Authors: yiyuan yang, Guodong Long, Tao Shen, Jing Jiang, Michael Blumenstein

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The effectiveness of the proposed method has been evaluated on benchmark datasets across different NLP tasks with released code. Experimental results demonstrate that our method achieves state-of-the-art performance on benchmarks and all data and code are released.
Researcher Affiliation Academia Yiyuan Yang Australian AI Institute, Faculty of Engineering & IT University of Technology Sydney Yiyuan.Yang-1@student.uts.edu.au Guodong Long Australian AI Institute, Faculty of Engineering & IT University of Technology Sydney Guodong.Long@uts.edu.au Tao Shen Australian AI Institute, Faculty of Engineering & IT University of Technology Sydney Tao.Sheng@uts.edu.au Jing Jiang Australian AI Institute, Faculty of Engineering & IT University of Technology Sydney Jing.Jiang@uts.edu.au Michael Blumenstein Australian AI Institute, Faculty of Engineering & IT University of Technology Sydney Michael.Blumenstein@uts.edu.au
Pseudocode Yes Algorithm 1: Fed DPA-F and Algorithm 2: Fed DPA-T in Appendix A.3.
Open Source Code Yes The effectiveness of the proposed method has been evaluated on benchmark datasets across different NLP tasks with released code. 1https://github.com/Lydia-yang/Fed DPA
Open Datasets Yes We construct two federated datasets from Flan [31], which is a collection of various NLP tasks from over 60 datasets for instruction turning. Jason Wei, Maarten Bosma, Vincent Y Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. Finetuned language models are zero-shot learners. ar Xiv preprint ar Xiv:2109.01652, 2021.
Dataset Splits No Section 3.1 mentions 'each client possesses its distinct local training dataset Dm train and test dataset Dm test'. Appendix A.1 specifies 'reducing the size of each selected local dataset to 300 training instances and 200 testing instances.' The paper does not explicitly state the use of a separate validation dataset split.
Hardware Specification Yes We implement all the methods using PyTorch and conduct all experiments on NVIDIA Quadro RTX 6000 GPU.
Software Dependencies No We implement all the methods using PyTorch and conduct all experiments on NVIDIA Quadro RTX 6000 GPU. All models are implemented using Lo RA to enhance learning efficiency (No version numbers specified for PyTorch or LoRA).
Experiment Setup Yes All models are implemented using Lo RA to enhance learning efficiency, with the rank of Lo RA set as r = 8 and only applied to Wq and Wv. For FL methods, each client conducts 10 local epochs with a batch size of 32. The updating weight of local Lo RA training (Fed DPA-T) is α = 0.5 (λ = 0.5) for federated dataset 1 and α = 0.3 (λ = 0.3) for federated dataset 2. We set S = 5 and choose cosine similarity for instance-wise dynamic weighting mechanism.