Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning

Authors: Tomoya Murata, Taiji Suzuki

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical results are given to verify the theoretical findings and give empirical evidence of the superiority of our method. We conducted a ten-class classification on CIFAR10 dataset. For each heterogeneity, we compared the empirical performances of our method and several existing methods.
Researcher Affiliation Collaboration 1NTT DATA Mathematical Systems Inc., Tokyo, Japan 2Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan 3Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan.
Pseudocode Yes Algorithm 1 Local GD(ex0, η, B, b, K, T) ... Algorithm 2 BVR-L-SGD(ex0, η, b, eb, K, T, S) ... Algorithm 3 Local-Routine(p, x0, η, v0, b, K)
Open Source Code No The paper does not provide any explicit statements about the release of source code or links to a code repository.
Open Datasets Yes We conducted a ten-class classification on CIFAR10 dataset.
Dataset Splits No The paper mentions 'train loss' and 'test accuracy' but does not explicitly specify train/validation/test dataset splits (e.g., percentages or sample counts) nor does it mention cross-validation. It implies a train and test split but no validation split is described.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup Yes For each local computation budget B {256, 512, 1024}, we set K = B/16 and b = 16 for local methods (Local SGD, SCAFFOLD and BVR-L-SGD), and b = B for non-local ones (minibatch SGD and SARAH). For each algorithm, we tuned learning rate η from {0.005, 0.01, 0.05, 0.1, 0.5, 1.0}. We conducted our experiments using an onehidden layer fully connected neural network with 100 hidden units and softplus activation. For loss function, we used the standard cross-entropy loss. We initialized parameters by uniformly sampling the parameters from [ p 6/(nin + nout), p 6/(nin + nout)] (Glorot & Bengio, 2010), where nin and nout are the number of units in the input and output layers respectively. Furthermore, we add L2-regularizer to the empirical risk with fixed regularization parameter 5 10 3.