Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
SemiFL: Semi-Supervised Federated Learning for Unlabeled Clients with Alternate Training
Authors: Enmao Diao, Jie Ding, Vahid Tarokh
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments and demonstrate that our approach significantly improves the performance of a labeled server with unlabeled clients training with multiple local epochs. Moreover, our method outperforms many existing SSFL baselines and performs competitively with the state-of-the-art FL and SSL results. Our code is available here. |
| Researcher Affiliation | Academia | Enmao Diao Department of Electrical and Computer Engineering Duke University Durham, NC 27705, USA EMAIL; Jie Ding School of Statistics University of Minnesota-Twin Cities Minneapolis, MN 55455, USA EMAIL; Vahid Tarokh Department of Electrical and Computer Engineering Duke University Durham, NC 27705, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Semi-Supervised Federated Learning with Alternate Training for Unlabeled Clients |
| Open Source Code | Yes | Our code is available here. (Also from Ethics Checklist 3.a: We provide source codes in the supplementary material.) |
| Open Datasets | Yes | To evaluate our proposed method, we conduct experiments with CIFAR10, SVHN, and CIFAR100 datasets [38,39]. (From Ethics Checklist 4.a: We cite the publicly available datasets we use.) |
| Dataset Splits | Yes | The number of labeled data at the server for CIFAR10, SVHN, and CIFAR100 datasets NS are {250, 4000}, {100, 2500}, and {2500, 10000} respectively. We uniformly assign the same number of data examples for IID data partition to each client. For a balanced Non-IID data partition, we ensure each client has data at most from K classes and the sample size of each class is the same. We set K = 2... For unbalanced Non-IID data partition, we sample data for each client from a Dirichlet distribution Dir(α) [41,42]. We perform experiments with α = {0.1, 0.3}. (From Ethics Checklist 3.b: See Section C.1 in Appendix.) |
| Hardware Specification | Yes | One Nvidia 1080TI is enough for one experiment run. |
| Software Dependencies | No | The paper mentions software used, such as Wide Res Net and specific SSL methods (Fix Match, Mix Match), but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We have 100 clients throughout our experiments, and the activity rate per communication round is C = 0.1. We uniformly assign the same number of data examples for IID data partition to each client. The number of labeled data at the server for CIFAR10, SVHN, and CIFAR100 datasets NS are {250, 4000}, {100, 2500}, and {2500, 10000} respectively. Algorithm 1 lists parameters such as E (local training epochs), Bs and Bm (batch sizes), η (local learning rate), τ (confidence threshold), a (Mixup hyperparameter), and λ (loss hyperparameter, set to one). |