Understanding Convergence and Generalization in Federated Learning through Feature Learning Theory

Authors: Wei Huang, Ye Shi, Zhongyi Cai, Taiji Suzuki

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on both synthetic and real-world datasets verify our theoretical conclusions and emphasize the effectiveness of the weighted Fed Avg approach.
Researcher Affiliation Academia Wei Huang RIKEN AIP wei.huang.vr@riken.jp Ye Shi Shanghaitech University shiye@shanghaitech.edu.cn Zhongyi Cai Shanghaitech University caizhy@shanghaitech.edu.cn Taiji Suzuki The University of Tokyo & RIKEN AIP taiji@mist.iu-tokyo.ac.jp
Pseudocode No The paper describes the Fed Avg algorithm and mathematical derivations, but it does not present a formal pseudocode or algorithm block.
Open Source Code Yes Codes are available at https://anonymous.4open.science/r/ fed-feature-learning-31E9/.
Open Datasets Yes We conducted experiments on three image classification datasets: CIFAR10, CIFAR100 (Krizhevsky et al., 2009), and Digits.
Dataset Splits No The paper mentions 'ntrain = 100 and the testing data size is set to ntest = 2000' for synthetic data. For real-world datasets, it discusses 'training data' and 'testing data' but does not specify a validation set or explicit train/test/validation split percentages.
Hardware Specification Yes Algorithms were implemented on Py Torch Paszke et al. (2019) with an RTX 3090 Ti GPU.
Software Dependencies No Algorithms were implemented on Py Torch Paszke et al. (2019)... The paper mentions PyTorch but does not specify a version number for it or other software dependencies like Python, CUDA, etc.
Experiment Setup Yes The client number is set to be K = 20 and all of them are selected in each communication round. Each client is equipped with a two-layer ReLU CNN descried in Section 3.3, where width m = 50. Then, we establish our synthetic dataset. The training data size is set to ntrain = 100 and the testing data size is set to ntest = 2000 with instance dimension to d = 1000 for each client. We further set the signal strength µ^2 = 2 and noise variance σp = 1 for all clients. We trained models for E = 5 epochs in the local training phase and conducted R = 100 communication rounds in total. All models are trained with the SGD optimizer whose learning rate is η = 1. ... For all experiments in this part, models were trained with 300 communication rounds and all tasks were optimized with the SGD optimizer. For experiments on CIFAR10/CIFAR100, we set the learning rate 0.03 and randomly sampled half of the clients participating in each communication round. For experiments on Digits, the learning rate is set to 0.01 and all clients are involved in each communication round.