Federated Learning with Bilateral Curation for Partially Class-Disjoint Data

Authors: Ziqing Fan, ruipeng zhang, Jiangchao Yao, Bo Han, Ya Zhang, Yanfeng Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on a range of datasets to demonstrate that our Fed GELA achieves promising performance (averaged improvement of 3.9% to Fed Avg and 1.5% to best baselines) and provide both local and global convergence guarantees.
Researcher Affiliation Academia Ziqing Fan1,2, Ruipeng Zhang1,2, Jiangchao Yao1,2, Bo Han3, Ya Zhang1,2, Yanfeng Wang1,2,B 1Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, 2Shanghai AI Laboratory, 3Hong Kong Baptist University {zqfan_knight, zhangrp, sunarker}@sjtu.edu.cn bhanml@comp.hkbu.edu.hk, {ya_zhang, wangyanfeng}@sjtu.edu.cn
Pseudocode Yes Algorithm 1 Fed GELA
Open Source Code Yes Source code is available at: https://github.com/Media Brain-SJTU/Fed GELA.
Open Datasets Yes We adopt three popular benchmark datasets SVHN [23], CIFAR10/100 [16] in federated learning. As for data splitting, we utilize Dirichlet Distribution (Dir (β), β = {10000, 0.5, 0.2, 0.1}) to simulate the situations of independently identical distribution and different levels of PCDD. Besides, one standard real-world PCDD dataset, Fed-ISIC2019 [4, 7, 34, 35] is used, and we follow the setting in the Flamby benchmark [34].
Dataset Splits No The paper describes data splitting across clients using Dirichlet Distribution but does not explicitly state the train/validation/test dataset splits by percentage or sample count, nor does it explicitly state the use of standard splits for the benchmark datasets.
Hardware Specification Yes All methods are implemented by Py Torch [27] with NVIDIA Ge Force RTX 3090.
Software Dependencies No The paper mentions 'Py Torch [27]' but does not specify a version number or other software dependencies with their versions.
Experiment Setup Yes We use SGD with learning rate 0.01, weight decay 10 4, and momentum 0.9. The batch size is set as 100 and the local updates are set as 10 epochs for all approaches. As for method-specific hyper-parameters like the proximal term in Fed Prox, we tune it carefully. In our method, there are EW and EH need to set, we normalize features with length 1 (EH = 1) and only tune the length scaling of classifier (EW ).