On Large-Cohort Training for Federated Learning

Authors: Zachary Charles, Zachary Garrett, Zhouyuan Huo, Sergei Shmulyian, Virginia Smith

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We give partial answers to these questions based on extensive empirical evaluation.
Researcher Affiliation Collaboration Zachary Charles Google zachcharles@google.com Zachary Garrett Google zachgarrett@google.com Zhouyuan Huo Google zhhuo@google.com Sergei Shmulyian Google sshmulyian@google.com Virginia Smith Carnegie Mellon University smithv@cmu.edu
Pseudocode Yes Algorithm 1 Fed Opt framework
Open Source Code Yes We provide open-source implementations of all simulations in Tensor Flow Federated [4]2. 2https://github.com/google-research/federated/tree/f4e26c1b9b47ac320e520a8b9943ea2c5324b8c2/ large_cohort
Open Datasets Yes We use four datasets: CIFAR-100 [35], EMNIST [13], Shakespeare [8], and Stack Overflow [3].
Dataset Splits Yes We tune learning rates for all algorithms and models using a held-out validation set: We perform T = 1500 rounds of training with M = 50, E = 1 for each algorithm and model, varying ηc, ηs over {10i | 3 i 1} and select the values that maximize the average validation performance over 5 random trials.
Hardware Specification No All experiments were conducted using clusters of multi-core CPUs, though our results are independent of wall-clock time and amount of compute resources. (This is too general; no specific CPU models or detailed cluster specs are provided).
Software Dependencies No We provide open-source implementations of all simulations in Tensor Flow Federated [4]2. (A version number for TensorFlow Federated or other libraries is not specified).
Experiment Setup Yes We set pk to be the number of examples in client k s dataset. We tune learning rates for all algorithms and models using a held-out validation set: We perform T = 1500 rounds of training with M = 50, E = 1 for each algorithm and model, varying ηc, ηs over {10i | 3 i 1} and select the values that maximize the average validation performance over 5 random trials. All other hyperparameters (such as momentum) are fixed.