Epidemic Learning: Boosting Decentralized Learning with Randomized Communication
Authors: Martijn De Vos, Sadegh Farhadkhani, Rachid Guerraoui, Anne-marie Kermarrec, Rafael Pires, Rishi Sharma
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate EL in a 96-node network and compare its performance with state-of-the-art DL approaches. Our results illustrate that EL converges up to 1.7 quicker than baseline DL algorithms and attains 2.2% higher accuracy for the same communication volume. Our theoretical analysis in Section 3 shows that EL surpasses the best-known static and randomized topologies in terms of convergence speed. |
| Researcher Affiliation | Academia | Martijn de Vos Sadegh Farhadkhani Rachid Guerraoui Anne-Marie Kermarrec Rafael Pires Rishi Sharma EPFL, Switzerland |
| Pseudocode | Yes | Algorithm 1 Epidemic Learning as executed by a node i |
| Open Source Code | Yes | Source code can be found at https://github.com/sacs-epfl/decentralizepy/releases/tag/epidemic-neurips-2023. |
| Open Datasets | Yes | We evaluate the baseline algorithms using the CIFAR-10 image classification dataset [26] and the FEMNIST dataset, the latter being part of the LEAF benchmark [7]. |
| Dataset Splits | No | The step size (γ) was tuned by running each baseline on a range of values and taking the setup with the best validation accuracy. |
| Hardware Specification | Yes | We perform experiments on 6 hyperthreading-enabled machines with dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz of 8 cores. |
| Software Dependencies | Yes | Both EL-Oracle and EL-Local were implemented using the Decentralize Py framework [11] and Python 3.83. |
| Experiment Setup | Yes | We deploy 96 DL nodes for each experiment, interconnected according to the evaluated topologies. When experimenting with s-regular topologies, each node maintains a fixed degree of log2(n), i.e., each node has 7 neighbors. ... The step size (γ) was tuned by running each baseline on a range of values and taking the setup with the best validation accuracy. The optimal step-sizes are γ = 0.1 for Fully connected and 8-U-Equi Static, and γ = 0.05 for the remaining algorithms and topologies over CIFAR-10. For FEMNIST, the optimal step size is γ = 0.1 for all algorithms and topologies. Table 3 provides a summary of the learning parameters, model, and dataset used in the experiments. (e.g. batch size (b)=8, training steps per round (r)=3 for CIFAR-10) |