Throughput-Optimal Topology Design for Cross-Silo Federated Learning
Authors: Othmane MARFOQ, CHUAN XU, Giovanni Neglia, Richard Vidal
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We adapted Py Torch with the MPI backend to run DPASGD (see (2)) on a GPU cluster. We also developed a separate network simulator that takes as input an arbitrary underlay topology described in the Graph Modelling Language [36] and silos computation times and calculates the time instants at which local models wi(k) are computed according to (2) (Appendix F). While Py Torch trains the model as fast as the cluster permits, the network simulator reconstructs the real timeline on the considered underlay. The code is available at https://github.com/omarfoq/ communication-in-cross-silo-fl. We considered three real topologies from Rocketfuel engine [94] (Exodus and Ebone) and from The Internet Topology Zoo [48] (Géant), and two synthetic topologies (AWS North-America and Gaia) built from the geographical locations of AWS data centers [38, 96] (Table 3). |
| Researcher Affiliation | Collaboration | Othmane Marfoq Inria, Université Côte d Azur, Accenture Labs, Sophia Antipolis, France othmane.marfoq@inria.fr Chuan Xu Inria, Université Côte d Azur, Sophia Antipolis, France chuan.xu@inria.fr Giovanni Neglia Inria, Université Côte d Azur, Sophia Antipolis, France giovanni.neglia@inria.fr Richard Vidal Accenture Labs, Sophia Antipolis, France richard.vidal@accenture.com |
| Pseudocode | Yes | We propose Algorithm 1 (see Appendix D), which combines existing approximation algorithms for δ-MBST on a particular graph built from Gc. |
| Open Source Code | Yes | The code is available at https://github.com/omarfoq/ communication-in-cross-silo-fl. |
| Open Datasets | Yes | We considered three real topologies from Rocketfuel engine [94] (Exodus and Ebone) and from The Internet Topology Zoo [48] (Géant), and two synthetic topologies (AWS North-America and Gaia) built from the geographical locations of AWS data centers [38, 96] (Table 3). We evaluated our solutions on three standard federated datasets from LEAF [14] and on i Naturalist dataset [99] with geolocalized images from over 8,000 different species of plants and animals (Table 2). Shakespeare [14, 72] Next-Character Prediction, FEMNIST [14] Image classification, Sentiment140 [30] Sentiment analysis, i Naturalist [99] Image classification. |
| Dataset Splits | No | The paper mentions generating non-iid data distributions and assigning data to silos, but it does not specify explicit train/validation/test splits (e.g., percentages or counts) or refer to standard predefined splits for the datasets used. |
| Hardware Specification | Yes | We adapted Py Torch with the MPI backend to run DPASGD (see (2)) on a GPU cluster. Mini-batch gradient computation time with NVIDIA Tesla P100. |
| Software Dependencies | No | The paper mentions "Py Torch with the MPI backend" but does not specify version numbers for PyTorch or MPI, which are necessary for full reproducibility. |
| Experiment Setup | Yes | One local computation step (s = 1). MATCHA’s parameter Cb equals 0.5 as in experiments in [104]. The consensus matrix A is selected according to the local-degree rule [62]. |