Spectral Co-Distillation for Personalized Federated Learning
Authors: Zihan Chen, Howard Yang, Tony Quek, Kai Fong Ernest Chong
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on multiple datasets over diverse heterogeneous data settings, we demonstrate the outperformance and efficacy of our proposed spectral co-distillation method, as well as our wait-free training protocol. |
| Researcher Affiliation | Academia | Zihan Chen1, Howard H. Yang2, Tony Q.S. Quek1, and Kai Fong Ernest Chong1 1Singapore University of Technology and Design (SUTD) 2Zhejiang University/University of Illinois Urbana-Champaign Institute, Zhejiang University |
| Pseudocode | Yes | Algorithm 1 Spectral Co-Distillation with Wait-free Training for PFL+ |
| Open Source Code | No | The paper does not provide any explicit statement about the availability of open-source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | We evaluated our proposed PFL+ framework with N clients on CIFAR-10/100 [51], and i Naturalist-2017 [53] |
| Dataset Splits | No | The paper describes its use of local training sets and local/global test sets, and how data is partitioned, but does not explicitly specify a separate ‘validation’ dataset split with percentages or counts for model tuning or early stopping. |
| Hardware Specification | No | The paper states ‘All experiments were implemented using Pytorch’ but does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions ‘All experiments were implemented using Pytorch’ but does not specify the version number of PyTorch or any other software dependencies, making it difficult to reproduce the software environment. |
| Experiment Setup | Yes | For all methods, we used an SGD local optimizer with a momentum of 0.5 and with no weight decay. We train all methods over a total number of T = 500 global communication rounds. Batch size for CIFAR-10/100 [51] and i Naturalist-2017 [53] are 10 and 128, respectively. For our proposed method, we used a learning rate of 0.01 (resp. 0.003) for both ηG and ηp when training on CIFAR-10/100 (resp. i Naturalist-2017). |