Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Beyond Spectral Gap: The Role of the Topology in Decentralized Learning
Authors: Thijs Vogels, Hadrien Hendrikx, Martin Jaggi
JMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theory matches empirical observations in deep learning, and accurately describes the relative merits of different graph topologies. This paper is an extension of the conference paper by Vogels et al. (2022). Code: github.com/epfml/topology-in-decentralized-learning. ... We quantify the role of the graph in a quadratic toy problem designed to mimic the initial phase of deep learning (Section 3.1), showing that averaging enables a larger learning rate. ... These insights prove to be relevant in deep learning, where we accurately describe the performance of a variety of topologies, while their spectral gap does not (Section 5). ... 5. Empirical Relevance in Deep Learning ... We experiment with a variety of 32-worker topologies on Cifar-10 (Krizhevsky et al.) with a VGG-11 model (Simonyan and Zisserman, 2015). ... Figure 4 shows the loss reached after the first 2.5k SGD steps for all topologies and for a dense grid of learning rates. |
| Researcher Affiliation | Academia | Thijs Vogels EMAIL ... Hadrien Hendrikx EMAIL ... Martin Jaggi EMAIL ... Machine Learning and Optimization Laboratory EPFL Lausanne, Switzerland. Both EPFL (École Polytechnique Fédérale de Lausanne) and Inria (Institut national de recherche en sciences et technologies du numérique) are academic or public research institutions. |
| Pseudocode | No | The paper describes the D-SGD algorithm mathematically as: "(D-SGD): x(t+1)i = Pj=1 wijx(t)j η fξ(t)i (x(t)i)". This is a mathematical expression of the algorithm's steps rather than a structured pseudocode block. No explicit section or figure is labeled "Pseudocode" or "Algorithm". |
| Open Source Code | Yes | Code: github.com/epfml/topology-in-decentralized-learning. |
| Open Datasets | Yes | We experiment with a variety of 32-worker topologies on Cifar-10 (Krizhevsky et al.) with a VGG-11 model (Simonyan and Zisserman, 2015). ... There, we use larger graphs (of 64 workers), a different model and data set (an MLP on Fashion MNIST (Xiao et al., 2017)) |
| Dataset Splits | No | The paper mentions training for "2.5k SGD steps" and "25 epochs" on Cifar-10 and refers to "Appendix E of (Vogels et al., 2022) for full details on the experimental setup." However, within the provided text, specific percentages, sample counts, or a detailed methodology for training/test/validation dataset splits are not explicitly given. |
| Hardware Specification | No | The paper mentions running experiments on "data centers" but does not provide any specific details about the hardware used, such as GPU/CPU models, memory specifications, or cloud instance types. |
| Software Dependencies | No | The paper discusses using models like VGG-11 and MLP, but it does not specify any software libraries or frameworks (e.g., PyTorch, TensorFlow, CUDA) along with their version numbers that would be required to replicate the experiments. |
| Experiment Setup | Yes | Figure 4 shows the loss reached after the first 2.5k SGD steps for all topologies and for a dense grid of learning rates. ... We focus on the initial phase of training, 25k steps in our case, where both train and test loss converge close to linearly. Using a large learning rate in this phase is found to be important for good generalization (Li et al., 2019; Wang et al., 2022). ... we use larger graphs (of 64 workers), a different model and data set (an MLP on Fashion MNIST (Xiao et al., 2017)), and no momentum or weight decay. |