reproducibilityindex.ai

Towards Diverse Device Heterogeneous Federated Learning via Task Arithmetic Knowledge Integration

Authors: Mahdi Morafah, Vyacheslav Kungurtsev, Hojin Chang, Chen Chen, Bill Lin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive evaluations of our method across both computer vision (CV) and natural language processing (NLP) tasks demonstrate that TAKFL achieves state-of-the-art results in a variety of datasets and settings, significantly outperforming existing KD-based methods. Our code is released at https://github.com/MMorafah/TAKFL.
Researcher Affiliation	Academia	Mahdi Morafah 1, Vyacheslav Kungurtsev2, Hojin Chang1, Chen Chen3, Bill Lin1 1University of California San Diego (UCSD), 2Czech Technical University in Prague, 3University of Central Florida (UCF)
Pseudocode	Yes	The full algorithm description of TAKFL is presented in Algorithm 1.
Open Source Code	Yes	Our code is released at https://github.com/MMorafah/TAKFL.
Open Datasets	Yes	For CV, we train image classification using CIFAR10/100 [24], CINIC10 [9], and Tiny Imagenet [25]. For NLP, we fine-tune pre-trained models for text classification on MNLI [52], SST-2 [45], MARC [22], and AG News [60].
Dataset Splits	Yes	In our experiments, we considered this as a hyperparameter and tuned it manually or determined it using held-out validation sets which achieves similar results. More details can be found in Appendix F.3.
Hardware Specification	Yes	We use two NVIDIA RTX 3090 gpus to conduct the entire experimentation in this paper.
Software Dependencies	No	We implement our entire code in Py Torch [38] using Fed Zoo benchmark [36] and release it at https: //github.com/MMorafah/TAKFL. It mentions PyTorch but not a specific version number, nor does it list other software dependencies with version numbers.
Experiment Setup	Yes	We use the Adam optimizer for both CV and NLP tasks. For CV, local training involves 20 epochs with a learning rate of 0.001, weight decay of 5e-5, and a batch size of 64. NLP training is conducted over 1 epoch with a learning rate of 3e-5, no weight decay, and a batch size of 32. For distillation, Adam is used with a learning rate of 1e-5 and weight decay of 5e-4 for CV, and 3e-5 with no weight decay for NLP. Batch sizes for distillation are 128 for CV and 32 for NLP. The softmax temperature is set at 3 for both tasks, with a temperature of 20 for self-regularization. Further details are provided in Appendix F.1 and F.2.