Towards Diverse Device Heterogeneous Federated Learning via Task Arithmetic Knowledge Integration
Authors: Mahdi Morafah, Vyacheslav Kungurtsev, Hojin Chang, Chen Chen, Bill Lin
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive evaluations of our method across both computer vision (CV) and natural language processing (NLP) tasks demonstrate that TAKFL achieves state-of-the-art results in a variety of datasets and settings, significantly outperforming existing KD-based methods. Our code is released at https://github.com/MMorafah/TAKFL. |
| Researcher Affiliation | Academia | Mahdi Morafah 1, Vyacheslav Kungurtsev2, Hojin Chang1, Chen Chen3, Bill Lin1 1University of California San Diego (UCSD), 2Czech Technical University in Prague, 3University of Central Florida (UCF) |
| Pseudocode | Yes | The full algorithm description of TAKFL is presented in Algorithm 1. |
| Open Source Code | Yes | Our code is released at https://github.com/MMorafah/TAKFL. |
| Open Datasets | Yes | For CV, we train image classification using CIFAR10/100 [24], CINIC10 [9], and Tiny Imagenet [25]. For NLP, we fine-tune pre-trained models for text classification on MNLI [52], SST-2 [45], MARC [22], and AG News [60]. |
| Dataset Splits | Yes | In our experiments, we considered this as a hyperparameter and tuned it manually or determined it using held-out validation sets which achieves similar results. More details can be found in Appendix F.3. |
| Hardware Specification | Yes | We use two NVIDIA RTX 3090 gpus to conduct the entire experimentation in this paper. |
| Software Dependencies | No | We implement our entire code in Py Torch [38] using Fed Zoo benchmark [36] and release it at https: //github.com/MMorafah/TAKFL. It mentions PyTorch but not a specific version number, nor does it list other software dependencies with version numbers. |
| Experiment Setup | Yes | We use the Adam optimizer for both CV and NLP tasks. For CV, local training involves 20 epochs with a learning rate of 0.001, weight decay of 5e-5, and a batch size of 64. NLP training is conducted over 1 epoch with a learning rate of 3e-5, no weight decay, and a batch size of 32. For distillation, Adam is used with a learning rate of 1e-5 and weight decay of 5e-4 for CV, and 3e-5 with no weight decay for NLP. Batch sizes for distillation are 128 for CV and 32 for NLP. The softmax temperature is set at 3 for both tasks, with a temperature of 20 for self-regularization. Further details are provided in Appendix F.1 and F.2. |