Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Towards Diverse Device Heterogeneous Federated Learning via Task Arithmetic Knowledge Integration
Authors: Mahdi Morafah, Vyacheslav Kungurtsev, Hojin Chang, Chen Chen, Bill Lin
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive evaluations of our method across both computer vision (CV) and natural language processing (NLP) tasks demonstrate that TAKFL achieves state-of-the-art results in a variety of datasets and settings, significantly outperforming existing KD-based methods. Our code is released at https://github.com/MMorafah/TAKFL. |
| Researcher Affiliation | Academia | Mahdi Morafah 1, Vyacheslav Kungurtsev2, Hojin Chang1, Chen Chen3, Bill Lin1 1University of California San Diego (UCSD), 2Czech Technical University in Prague, 3University of Central Florida (UCF) |
| Pseudocode | Yes | The full algorithm description of TAKFL is presented in Algorithm 1. |
| Open Source Code | Yes | Our code is released at https://github.com/MMorafah/TAKFL. |
| Open Datasets | Yes | For CV, we train image classification using CIFAR10/100 [24], CINIC10 [9], and Tiny Imagenet [25]. For NLP, we fine-tune pre-trained models for text classification on MNLI [52], SST-2 [45], MARC [22], and AG News [60]. |
| Dataset Splits | Yes | In our experiments, we considered this as a hyperparameter and tuned it manually or determined it using held-out validation sets which achieves similar results. More details can be found in Appendix F.3. |
| Hardware Specification | Yes | We use two NVIDIA RTX 3090 gpus to conduct the entire experimentation in this paper. |
| Software Dependencies | No | We implement our entire code in Py Torch [38] using Fed Zoo benchmark [36] and release it at https: //github.com/MMorafah/TAKFL. It mentions PyTorch but not a specific version number, nor does it list other software dependencies with version numbers. |
| Experiment Setup | Yes | We use the Adam optimizer for both CV and NLP tasks. For CV, local training involves 20 epochs with a learning rate of 0.001, weight decay of 5e-5, and a batch size of 64. NLP training is conducted over 1 epoch with a learning rate of 3e-5, no weight decay, and a batch size of 32. For distillation, Adam is used with a learning rate of 1e-5 and weight decay of 5e-4 for CV, and 3e-5 with no weight decay for NLP. Batch sizes for distillation are 128 for CV and 32 for NLP. The softmax temperature is set at 3 for both tasks, with a temperature of 20 for self-regularization. Further details are provided in Appendix F.1 and F.2. |