reproducibilityindex.ai

Understanding Convergence and Generalization in Federated Learning through Feature Learning Theory

Authors: Wei Huang, Ye Shi, Zhongyi Cai, Taiji Suzuki

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on both synthetic and real-world datasets verify our theoretical conclusions and emphasize the effectiveness of the weighted Fed Avg approach.
Researcher Affiliation	Academia	Wei Huang RIKEN AIP wei.huang.vr@riken.jp Ye Shi Shanghaitech University shiye@shanghaitech.edu.cn Zhongyi Cai Shanghaitech University caizhy@shanghaitech.edu.cn Taiji Suzuki The University of Tokyo & RIKEN AIP taiji@mist.iu-tokyo.ac.jp
Pseudocode	No	The paper describes the Fed Avg algorithm and mathematical derivations, but it does not present a formal pseudocode or algorithm block.
Open Source Code	Yes	Codes are available at https://anonymous.4open.science/r/ fed-feature-learning-31E9/.
Open Datasets	Yes	We conducted experiments on three image classification datasets: CIFAR10, CIFAR100 (Krizhevsky et al., 2009), and Digits.
Dataset Splits	No	The paper mentions 'ntrain = 100 and the testing data size is set to ntest = 2000' for synthetic data. For real-world datasets, it discusses 'training data' and 'testing data' but does not specify a validation set or explicit train/test/validation split percentages.
Hardware Specification	Yes	Algorithms were implemented on Py Torch Paszke et al. (2019) with an RTX 3090 Ti GPU.
Software Dependencies	No	Algorithms were implemented on Py Torch Paszke et al. (2019)... The paper mentions PyTorch but does not specify a version number for it or other software dependencies like Python, CUDA, etc.
Experiment Setup	Yes	The client number is set to be K = 20 and all of them are selected in each communication round. Each client is equipped with a two-layer ReLU CNN descried in Section 3.3, where width m = 50. Then, we establish our synthetic dataset. The training data size is set to ntrain = 100 and the testing data size is set to ntest = 2000 with instance dimension to d = 1000 for each client. We further set the signal strength µ^2 = 2 and noise variance σp = 1 for all clients. We trained models for E = 5 epochs in the local training phase and conducted R = 100 communication rounds in total. All models are trained with the SGD optimizer whose learning rate is η = 1. ... For all experiments in this part, models were trained with 300 communication rounds and all tasks were optimized with the SGD optimizer. For experiments on CIFAR10/CIFAR100, we set the learning rate 0.03 and randomly sampled half of the clients participating in each communication round. For experiments on Digits, the learning rate is set to 0.01 and all clients are involved in each communication round.