Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DoFIT: Domain-aware Federated Instruction Tuning with Alleviated Catastrophic Forgetting
Authors: Binqian Xu, Xiangbo Shu, Haiyang Mei, Zechen Bai, Basura Fernando, Mike Zheng Shou, Jinhui Tang
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on diverse datasets consistently demonstrate that Do FIT excels in cross-domain collaborative training and exhibits significant advantages over conventional FIT methods in alleviating catastrophic forgetting. |
| Researcher Affiliation | Academia | Binqian Xu1, Xiangbo Shu1,*, Haiyang Mei2, Zechen Bai2, Basura Fernando3, Mike Zheng Shou2, and Jinhui Tang1 1Nanjing University of Science and Technology 2Show Lab, National University of Singapore 3Institute of High-Performance Computing, A*STAR |
| Pseudocode | Yes | A.1 Algorithm Algorithm 1 The training process of Do FIT for two domains |
| Open Source Code | Yes | Code is available at https://github.com/1xbq1/Do FIT. |
| Open Datasets | Yes | We train our Do FIT on three datasets, i.e., Fin GPT [36], Alpaca-GPT4 [23], and Med Alpaca [2] from the Finance (F), General (G), and Medical (M) domains, respectively. ... FPB [19], Fi QA-SA [18], TFNS [17], and NWGI [33] are all the evaluation datasets on Finance domain. ... Med QA [10], and Med MCQA [22] are the evaluation datasets on M domain. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits with percentages or sample counts for a validation set. |
| Hardware Specification | Yes | In all experiments conducted on one NVIDIA A40, the frozen LLM used is Llama2-7B with 32 layers [27] quantized to int8. |
| Software Dependencies | No | The paper mentions Llama2-7B, Lo RA, and Adam W optimizer, but does not provide specific version numbers for these software components, nor for programming languages or libraries like Python or PyTorch. |
| Experiment Setup | Yes | In all experiments conducted on one NVIDIA A40, the frozen LLM used is Llama2-7B with 32 layers [27] quantized to int8. The Lo RA rank and alpha are set to 32 and 64, respectively. The maximum sequence length is 512. Following the formatting instructions of Alpaca template [25], the training runs for 200 rounds, with a cosine learning rate scheduler adjusting the learning rate from 5e-5 to 1e-6. In each round, the selected clients are trained 10 steps by Adam W [16] optimizer. The batch size is set to 16. In Fin GPT/Alpaca-GPT4/Med Alpaca training, total 10k/20k/20k samples for 50/20/20 clients, selecting 5/2/2 clients randomly per round. |