Understanding Convergence and Generalization in Federated Learning through Feature Learning Theory
Authors: Wei Huang, Ye Shi, Zhongyi Cai, Taiji Suzuki
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on both synthetic and real-world datasets verify our theoretical conclusions and emphasize the effectiveness of the weighted Fed Avg approach. |
| Researcher Affiliation | Academia | Wei Huang RIKEN AIP wei.huang.vr@riken.jp Ye Shi Shanghaitech University shiye@shanghaitech.edu.cn Zhongyi Cai Shanghaitech University caizhy@shanghaitech.edu.cn Taiji Suzuki The University of Tokyo & RIKEN AIP taiji@mist.iu-tokyo.ac.jp |
| Pseudocode | No | The paper describes the Fed Avg algorithm and mathematical derivations, but it does not present a formal pseudocode or algorithm block. |
| Open Source Code | Yes | Codes are available at https://anonymous.4open.science/r/ fed-feature-learning-31E9/. |
| Open Datasets | Yes | We conducted experiments on three image classification datasets: CIFAR10, CIFAR100 (Krizhevsky et al., 2009), and Digits. |
| Dataset Splits | No | The paper mentions 'ntrain = 100 and the testing data size is set to ntest = 2000' for synthetic data. For real-world datasets, it discusses 'training data' and 'testing data' but does not specify a validation set or explicit train/test/validation split percentages. |
| Hardware Specification | Yes | Algorithms were implemented on Py Torch Paszke et al. (2019) with an RTX 3090 Ti GPU. |
| Software Dependencies | No | Algorithms were implemented on Py Torch Paszke et al. (2019)... The paper mentions PyTorch but does not specify a version number for it or other software dependencies like Python, CUDA, etc. |
| Experiment Setup | Yes | The client number is set to be K = 20 and all of them are selected in each communication round. Each client is equipped with a two-layer ReLU CNN descried in Section 3.3, where width m = 50. Then, we establish our synthetic dataset. The training data size is set to ntrain = 100 and the testing data size is set to ntest = 2000 with instance dimension to d = 1000 for each client. We further set the signal strength µ^2 = 2 and noise variance σp = 1 for all clients. We trained models for E = 5 epochs in the local training phase and conducted R = 100 communication rounds in total. All models are trained with the SGD optimizer whose learning rate is η = 1. ... For all experiments in this part, models were trained with 300 communication rounds and all tasks were optimized with the SGD optimizer. For experiments on CIFAR10/CIFAR100, we set the learning rate 0.03 and randomly sampled half of the clients participating in each communication round. For experiments on Digits, the learning rate is set to 0.01 and all clients are involved in each communication round. |