Spectral Co-Distillation for Personalized Federated Learning

Authors: Zihan Chen, Howard Yang, Tony Quek, Kai Fong Ernest Chong

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments on multiple datasets over diverse heterogeneous data settings, we demonstrate the outperformance and efficacy of our proposed spectral co-distillation method, as well as our wait-free training protocol.
Researcher Affiliation Academia Zihan Chen1, Howard H. Yang2, Tony Q.S. Quek1, and Kai Fong Ernest Chong1 1Singapore University of Technology and Design (SUTD) 2Zhejiang University/University of Illinois Urbana-Champaign Institute, Zhejiang University
Pseudocode Yes Algorithm 1 Spectral Co-Distillation with Wait-free Training for PFL+
Open Source Code No The paper does not provide any explicit statement about the availability of open-source code for the described methodology, nor does it include a link to a code repository.
Open Datasets Yes We evaluated our proposed PFL+ framework with N clients on CIFAR-10/100 [51], and i Naturalist-2017 [53]
Dataset Splits No The paper describes its use of local training sets and local/global test sets, and how data is partitioned, but does not explicitly specify a separate ‘validation’ dataset split with percentages or counts for model tuning or early stopping.
Hardware Specification No The paper states ‘All experiments were implemented using Pytorch’ but does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions ‘All experiments were implemented using Pytorch’ but does not specify the version number of PyTorch or any other software dependencies, making it difficult to reproduce the software environment.
Experiment Setup Yes For all methods, we used an SGD local optimizer with a momentum of 0.5 and with no weight decay. We train all methods over a total number of T = 500 global communication rounds. Batch size for CIFAR-10/100 [51] and i Naturalist-2017 [53] are 10 and 128, respectively. For our proposed method, we used a learning rate of 0.01 (resp. 0.003) for both ηG and ηp when training on CIFAR-10/100 (resp. i Naturalist-2017).