SemiFL: Semi-Supervised Federated Learning for Unlabeled Clients with Alternate Training

Authors: Enmao Diao, Jie Ding, Vahid Tarokh

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments and demonstrate that our approach significantly improves the performance of a labeled server with unlabeled clients training with multiple local epochs. Moreover, our method outperforms many existing SSFL baselines and performs competitively with the state-of-the-art FL and SSL results. Our code is available here.
Researcher Affiliation Academia Enmao Diao Department of Electrical and Computer Engineering Duke University Durham, NC 27705, USA enmao.diao@duke.edu; Jie Ding School of Statistics University of Minnesota-Twin Cities Minneapolis, MN 55455, USA dingj@umn.edu; Vahid Tarokh Department of Electrical and Computer Engineering Duke University Durham, NC 27705, USA vahid.tarokh@duke.edu
Pseudocode Yes Algorithm 1 Semi-Supervised Federated Learning with Alternate Training for Unlabeled Clients
Open Source Code Yes Our code is available here. (Also from Ethics Checklist 3.a: We provide source codes in the supplementary material.)
Open Datasets Yes To evaluate our proposed method, we conduct experiments with CIFAR10, SVHN, and CIFAR100 datasets [38,39]. (From Ethics Checklist 4.a: We cite the publicly available datasets we use.)
Dataset Splits Yes The number of labeled data at the server for CIFAR10, SVHN, and CIFAR100 datasets NS are {250, 4000}, {100, 2500}, and {2500, 10000} respectively. We uniformly assign the same number of data examples for IID data partition to each client. For a balanced Non-IID data partition, we ensure each client has data at most from K classes and the sample size of each class is the same. We set K = 2... For unbalanced Non-IID data partition, we sample data for each client from a Dirichlet distribution Dir(α) [41,42]. We perform experiments with α = {0.1, 0.3}. (From Ethics Checklist 3.b: See Section C.1 in Appendix.)
Hardware Specification Yes One Nvidia 1080TI is enough for one experiment run.
Software Dependencies No The paper mentions software used, such as Wide Res Net and specific SSL methods (Fix Match, Mix Match), but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We have 100 clients throughout our experiments, and the activity rate per communication round is C = 0.1. We uniformly assign the same number of data examples for IID data partition to each client. The number of labeled data at the server for CIFAR10, SVHN, and CIFAR100 datasets NS are {250, 4000}, {100, 2500}, and {2500, 10000} respectively. Algorithm 1 lists parameters such as E (local training epochs), Bs and Bm (batch sizes), η (local learning rate), τ (confidence threshold), a (Mixup hyperparameter), and λ (loss hyperparameter, set to one).