Federated Learning with Bilateral Curation for Partially Class-Disjoint Data
Authors: Ziqing Fan, ruipeng zhang, Jiangchao Yao, Bo Han, Ya Zhang, Yanfeng Wang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on a range of datasets to demonstrate that our Fed GELA achieves promising performance (averaged improvement of 3.9% to Fed Avg and 1.5% to best baselines) and provide both local and global convergence guarantees. |
| Researcher Affiliation | Academia | Ziqing Fan1,2, Ruipeng Zhang1,2, Jiangchao Yao1,2, Bo Han3, Ya Zhang1,2, Yanfeng Wang1,2,B 1Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, 2Shanghai AI Laboratory, 3Hong Kong Baptist University {zqfan_knight, zhangrp, sunarker}@sjtu.edu.cn bhanml@comp.hkbu.edu.hk, {ya_zhang, wangyanfeng}@sjtu.edu.cn |
| Pseudocode | Yes | Algorithm 1 Fed GELA |
| Open Source Code | Yes | Source code is available at: https://github.com/Media Brain-SJTU/Fed GELA. |
| Open Datasets | Yes | We adopt three popular benchmark datasets SVHN [23], CIFAR10/100 [16] in federated learning. As for data splitting, we utilize Dirichlet Distribution (Dir (β), β = {10000, 0.5, 0.2, 0.1}) to simulate the situations of independently identical distribution and different levels of PCDD. Besides, one standard real-world PCDD dataset, Fed-ISIC2019 [4, 7, 34, 35] is used, and we follow the setting in the Flamby benchmark [34]. |
| Dataset Splits | No | The paper describes data splitting across clients using Dirichlet Distribution but does not explicitly state the train/validation/test dataset splits by percentage or sample count, nor does it explicitly state the use of standard splits for the benchmark datasets. |
| Hardware Specification | Yes | All methods are implemented by Py Torch [27] with NVIDIA Ge Force RTX 3090. |
| Software Dependencies | No | The paper mentions 'Py Torch [27]' but does not specify a version number or other software dependencies with their versions. |
| Experiment Setup | Yes | We use SGD with learning rate 0.01, weight decay 10 4, and momentum 0.9. The batch size is set as 100 and the local updates are set as 10 epochs for all approaches. As for method-specific hyper-parameters like the proximal term in Fed Prox, we tune it carefully. In our method, there are EW and EH need to set, we normalize features with length 1 (EH = 1) and only tune the length scaling of classifier (EW ). |