Towards Diverse Device Heterogeneous Federated Learning via Task Arithmetic Knowledge Integration

Authors: Mahdi Morafah, Vyacheslav Kungurtsev, Hojin Chang, Chen Chen, Bill Lin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive evaluations of our method across both computer vision (CV) and natural language processing (NLP) tasks demonstrate that TAKFL achieves state-of-the-art results in a variety of datasets and settings, significantly outperforming existing KD-based methods. Our code is released at https://github.com/MMorafah/TAKFL.
Researcher Affiliation Academia Mahdi Morafah 1, Vyacheslav Kungurtsev2, Hojin Chang1, Chen Chen3, Bill Lin1 1University of California San Diego (UCSD), 2Czech Technical University in Prague, 3University of Central Florida (UCF)
Pseudocode Yes The full algorithm description of TAKFL is presented in Algorithm 1.
Open Source Code Yes Our code is released at https://github.com/MMorafah/TAKFL.
Open Datasets Yes For CV, we train image classification using CIFAR10/100 [24], CINIC10 [9], and Tiny Imagenet [25]. For NLP, we fine-tune pre-trained models for text classification on MNLI [52], SST-2 [45], MARC [22], and AG News [60].
Dataset Splits Yes In our experiments, we considered this as a hyperparameter and tuned it manually or determined it using held-out validation sets which achieves similar results. More details can be found in Appendix F.3.
Hardware Specification Yes We use two NVIDIA RTX 3090 gpus to conduct the entire experimentation in this paper.
Software Dependencies No We implement our entire code in Py Torch [38] using Fed Zoo benchmark [36] and release it at https: //github.com/MMorafah/TAKFL. It mentions PyTorch but not a specific version number, nor does it list other software dependencies with version numbers.
Experiment Setup Yes We use the Adam optimizer for both CV and NLP tasks. For CV, local training involves 20 epochs with a learning rate of 0.001, weight decay of 5e-5, and a batch size of 64. NLP training is conducted over 1 epoch with a learning rate of 3e-5, no weight decay, and a batch size of 32. For distillation, Adam is used with a learning rate of 1e-5 and weight decay of 5e-4 for CV, and 3e-5 with no weight decay for NLP. Batch sizes for distillation are 128 for CV and 32 for NLP. The softmax temperature is set at 3 for both tasks, with a temperature of 20 for self-regularization. Further details are provided in Appendix F.1 and F.2.