reproducibilityindex.ai

Efficient Split-Mix Federated Learning for On-Demand and In-Situ Customization

Authors: Junyuan Hong, Haotao Wang, Zhangyang Wang, Jiayu Zhou

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our method provides better in-situ customization than the existing heterogeneous-architecture FL methods. Codes and pre-trained models are available: https://github.com/illidanlab/SplitMix.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, Michigan State University 2Department of Electrical and Computer Engineering, University of Texas at Austin
Pseudocode	Yes	Algorithm 1 Federated Split-Mix Learning, Algorithm 2 Sample Base Models(P, p, n, W), Algorithm 3 Local Train(Wk, Dk, E, η), Algorithm 4 Local Train(Wk, Dk, E, η) with DBN and adversarial training
Open Source Code	Yes	Codes and pre-trained models are available: https://github.com/illidanlab/SplitMix.
Open Datasets	Yes	We use CIFAR10 dataset (Krizhevsky, 2009) with preactivated ResNet (Pre ResNet18) (He et al., 2016). The CIFAR10 data are uniformly split into 100 clients and distribute 3 classes per client. For (feature) non-i.i.d. configuration, we use Digits with a CNN defined and Domain Net datasets (Li et al., 2020b) with AlexNet extended with BN layers after each convolutional or linear layer (Li et al., 2020b). The first dataset is a subset (30%) of Digits, a benchmark for domain adaption (Peng et al., 2019b). The second dataset is Domain Net (Peng et al., 2019a) processed by (Li et al., 2020b), which contains 6 distinct domains of large-size 256x256 real-world images.
Dataset Splits	Yes	The CIFAR10 data are uniformly split into 100 clients and distribute 3 classes per client. Each domain of Digits (or Domain Net) are split into 10 (or 5) clients, and therefore 50 (or 30) clients in total. We use an n-step projected gradient descent (PGD) attack (Madry et al., 2018) with a constant noise magnitude ϵ. Following (Madry et al., 2018), we set (ϵ, n) = (8/255, 7), and attack inner-loop step size 2/255, for training, validation, and test.
Hardware Specification	Yes	We implement all algorithms in PyTorch 1.4.1 run on a single NVIDIA RTX A5000 GPU and a 104-thread CPU.
Software Dependencies	Yes	We implement all algorithms in PyTorch 1.4.1 run on a single NVIDIA RTX A5000 GPU and a 104-thread CPU.
Experiment Setup	Yes	In general, for local optimization we use stochastic gradient descent (SGD) with 0.9 momentum and 5 · 10−4 weight decay. CIFAR10: Following Hetero FL (Diao et al., 2021), we train with 5 local epochs and 400 global communication rounds. Globally, we initialize the learning rate as 0.01 and adjust the learning rate at 150, 250 communication rounds with a scale rate of 0.1. Locally, we use a larger batch size of 128, to speed up the training in simulation. Digits: We use a cosine annealing learning rate decaying from 0.1 to 0 across 400 global communication rounds. SGD is executed with one epoch for each local client. Domain Net: We use a constant learning rate 0.01 and run 400 communication rounds in total. Similar to Digits, SGD is executed with one epoch for each local client.