Throughput-Optimal Topology Design for Cross-Silo Federated Learning

Authors: Othmane MARFOQ, CHUAN XU, Giovanni Neglia, Richard Vidal

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We adapted Py Torch with the MPI backend to run DPASGD (see (2)) on a GPU cluster. We also developed a separate network simulator that takes as input an arbitrary underlay topology described in the Graph Modelling Language [36] and silos computation times and calculates the time instants at which local models wi(k) are computed according to (2) (Appendix F). While Py Torch trains the model as fast as the cluster permits, the network simulator reconstructs the real timeline on the considered underlay. The code is available at https://github.com/omarfoq/ communication-in-cross-silo-fl. We considered three real topologies from Rocketfuel engine [94] (Exodus and Ebone) and from The Internet Topology Zoo [48] (Géant), and two synthetic topologies (AWS North-America and Gaia) built from the geographical locations of AWS data centers [38, 96] (Table 3).
Researcher Affiliation Collaboration Othmane Marfoq Inria, Université Côte d Azur, Accenture Labs, Sophia Antipolis, France othmane.marfoq@inria.fr Chuan Xu Inria, Université Côte d Azur, Sophia Antipolis, France chuan.xu@inria.fr Giovanni Neglia Inria, Université Côte d Azur, Sophia Antipolis, France giovanni.neglia@inria.fr Richard Vidal Accenture Labs, Sophia Antipolis, France richard.vidal@accenture.com
Pseudocode Yes We propose Algorithm 1 (see Appendix D), which combines existing approximation algorithms for δ-MBST on a particular graph built from Gc.
Open Source Code Yes The code is available at https://github.com/omarfoq/ communication-in-cross-silo-fl.
Open Datasets Yes We considered three real topologies from Rocketfuel engine [94] (Exodus and Ebone) and from The Internet Topology Zoo [48] (Géant), and two synthetic topologies (AWS North-America and Gaia) built from the geographical locations of AWS data centers [38, 96] (Table 3). We evaluated our solutions on three standard federated datasets from LEAF [14] and on i Naturalist dataset [99] with geolocalized images from over 8,000 different species of plants and animals (Table 2). Shakespeare [14, 72] Next-Character Prediction, FEMNIST [14] Image classification, Sentiment140 [30] Sentiment analysis, i Naturalist [99] Image classification.
Dataset Splits No The paper mentions generating non-iid data distributions and assigning data to silos, but it does not specify explicit train/validation/test splits (e.g., percentages or counts) or refer to standard predefined splits for the datasets used.
Hardware Specification Yes We adapted Py Torch with the MPI backend to run DPASGD (see (2)) on a GPU cluster. Mini-batch gradient computation time with NVIDIA Tesla P100.
Software Dependencies No The paper mentions "Py Torch with the MPI backend" but does not specify version numbers for PyTorch or MPI, which are necessary for full reproducibility.
Experiment Setup Yes One local computation step (s = 1). MATCHA’s parameter Cb equals 0.5 as in experiments in [104]. The consensus matrix A is selected according to the local-degree rule [62].