reproducibilityindex.ai

Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning

Authors: Tomoya Murata, Taiji Suzuki

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical results are given to verify the theoretical ﬁndings and give empirical evidence of the superiority of our method. We conducted a ten-class classiﬁcation on CIFAR10 dataset. For each heterogeneity, we compared the empirical performances of our method and several existing methods.
Researcher Affiliation	Collaboration	1NTT DATA Mathematical Systems Inc., Tokyo, Japan 2Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan 3Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan.
Pseudocode	Yes	Algorithm 1 Local GD(ex0, η, B, b, K, T) ... Algorithm 2 BVR-L-SGD(ex0, η, b, eb, K, T, S) ... Algorithm 3 Local-Routine(p, x0, η, v0, b, K)
Open Source Code	No	The paper does not provide any explicit statements about the release of source code or links to a code repository.
Open Datasets	Yes	We conducted a ten-class classiﬁcation on CIFAR10 dataset.
Dataset Splits	No	The paper mentions 'train loss' and 'test accuracy' but does not explicitly specify train/validation/test dataset splits (e.g., percentages or sample counts) nor does it mention cross-validation. It implies a train and test split but no validation split is described.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers.
Experiment Setup	Yes	For each local computation budget B {256, 512, 1024}, we set K = B/16 and b = 16 for local methods (Local SGD, SCAFFOLD and BVR-L-SGD), and b = B for non-local ones (minibatch SGD and SARAH). For each algorithm, we tuned learning rate η from {0.005, 0.01, 0.05, 0.1, 0.5, 1.0}. We conducted our experiments using an onehidden layer fully connected neural network with 100 hidden units and softplus activation. For loss function, we used the standard cross-entropy loss. We initialized parameters by uniformly sampling the parameters from [ p 6/(nin + nout), p 6/(nin + nout)] (Glorot & Bengio, 2010), where nin and nout are the number of units in the input and output layers respectively. Furthermore, we add L2-regularizer to the empirical risk with ﬁxed regularization parameter 5 10 3.