Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning
Authors: Tomoya Murata, Taiji Suzuki
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical results are given to verify the theoretical findings and give empirical evidence of the superiority of our method. We conducted a ten-class classification on CIFAR10 dataset. For each heterogeneity, we compared the empirical performances of our method and several existing methods. |
| Researcher Affiliation | Collaboration | 1NTT DATA Mathematical Systems Inc., Tokyo, Japan 2Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan 3Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan. |
| Pseudocode | Yes | Algorithm 1 Local GD(ex0, η, B, b, K, T) ... Algorithm 2 BVR-L-SGD(ex0, η, b, eb, K, T, S) ... Algorithm 3 Local-Routine(p, x0, η, v0, b, K) |
| Open Source Code | No | The paper does not provide any explicit statements about the release of source code or links to a code repository. |
| Open Datasets | Yes | We conducted a ten-class classification on CIFAR10 dataset. |
| Dataset Splits | No | The paper mentions 'train loss' and 'test accuracy' but does not explicitly specify train/validation/test dataset splits (e.g., percentages or sample counts) nor does it mention cross-validation. It implies a train and test split but no validation split is described. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | For each local computation budget B {256, 512, 1024}, we set K = B/16 and b = 16 for local methods (Local SGD, SCAFFOLD and BVR-L-SGD), and b = B for non-local ones (minibatch SGD and SARAH). For each algorithm, we tuned learning rate η from {0.005, 0.01, 0.05, 0.1, 0.5, 1.0}. We conducted our experiments using an onehidden layer fully connected neural network with 100 hidden units and softplus activation. For loss function, we used the standard cross-entropy loss. We initialized parameters by uniformly sampling the parameters from [ p 6/(nin + nout), p 6/(nin + nout)] (Glorot & Bengio, 2010), where nin and nout are the number of units in the input and output layers respectively. Furthermore, we add L2-regularizer to the empirical risk with fixed regularization parameter 5 10 3. |