Dual-Personalizing Adapter for Federated Foundation Models
Authors: yiyuan yang, Guodong Long, Tao Shen, Jing Jiang, Michael Blumenstein
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The effectiveness of the proposed method has been evaluated on benchmark datasets across different NLP tasks with released code. Experimental results demonstrate that our method achieves state-of-the-art performance on benchmarks and all data and code are released. |
| Researcher Affiliation | Academia | Yiyuan Yang Australian AI Institute, Faculty of Engineering & IT University of Technology Sydney Yiyuan.Yang-1@student.uts.edu.au Guodong Long Australian AI Institute, Faculty of Engineering & IT University of Technology Sydney Guodong.Long@uts.edu.au Tao Shen Australian AI Institute, Faculty of Engineering & IT University of Technology Sydney Tao.Sheng@uts.edu.au Jing Jiang Australian AI Institute, Faculty of Engineering & IT University of Technology Sydney Jing.Jiang@uts.edu.au Michael Blumenstein Australian AI Institute, Faculty of Engineering & IT University of Technology Sydney Michael.Blumenstein@uts.edu.au |
| Pseudocode | Yes | Algorithm 1: Fed DPA-F and Algorithm 2: Fed DPA-T in Appendix A.3. |
| Open Source Code | Yes | The effectiveness of the proposed method has been evaluated on benchmark datasets across different NLP tasks with released code. 1https://github.com/Lydia-yang/Fed DPA |
| Open Datasets | Yes | We construct two federated datasets from Flan [31], which is a collection of various NLP tasks from over 60 datasets for instruction turning. Jason Wei, Maarten Bosma, Vincent Y Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. Finetuned language models are zero-shot learners. ar Xiv preprint ar Xiv:2109.01652, 2021. |
| Dataset Splits | No | Section 3.1 mentions 'each client possesses its distinct local training dataset Dm train and test dataset Dm test'. Appendix A.1 specifies 'reducing the size of each selected local dataset to 300 training instances and 200 testing instances.' The paper does not explicitly state the use of a separate validation dataset split. |
| Hardware Specification | Yes | We implement all the methods using PyTorch and conduct all experiments on NVIDIA Quadro RTX 6000 GPU. |
| Software Dependencies | No | We implement all the methods using PyTorch and conduct all experiments on NVIDIA Quadro RTX 6000 GPU. All models are implemented using Lo RA to enhance learning efficiency (No version numbers specified for PyTorch or LoRA). |
| Experiment Setup | Yes | All models are implemented using Lo RA to enhance learning efficiency, with the rank of Lo RA set as r = 8 and only applied to Wq and Wv. For FL methods, each client conducts 10 local epochs with a batch size of 32. The updating weight of local Lo RA training (Fed DPA-T) is α = 0.5 (λ = 0.5) for federated dataset 1 and α = 0.3 (λ = 0.3) for federated dataset 2. We set S = 5 and choose cosine similarity for instance-wise dynamic weighting mechanism. |