reproducibilityindex.ai

Epidemic Learning: Boosting Decentralized Learning with Randomized Communication

Authors: Martijn De Vos, Sadegh Farhadkhani, Rachid Guerraoui, Anne-marie Kermarrec, Rafael Pires, Rishi Sharma

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate EL in a 96-node network and compare its performance with state-of-the-art DL approaches. Our results illustrate that EL converges up to 1.7 quicker than baseline DL algorithms and attains 2.2% higher accuracy for the same communication volume. Our theoretical analysis in Section 3 shows that EL surpasses the best-known static and randomized topologies in terms of convergence speed.
Researcher Affiliation	Academia	Martijn de Vos Sadegh Farhadkhani Rachid Guerraoui Anne-Marie Kermarrec Rafael Pires Rishi Sharma EPFL, Switzerland
Pseudocode	Yes	Algorithm 1 Epidemic Learning as executed by a node i
Open Source Code	Yes	Source code can be found at https://github.com/sacs-epfl/decentralizepy/releases/tag/epidemic-neurips-2023.
Open Datasets	Yes	We evaluate the baseline algorithms using the CIFAR-10 image classification dataset [26] and the FEMNIST dataset, the latter being part of the LEAF benchmark [7].
Dataset Splits	No	The step size (γ) was tuned by running each baseline on a range of values and taking the setup with the best validation accuracy.
Hardware Specification	Yes	We perform experiments on 6 hyperthreading-enabled machines with dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz of 8 cores.
Software Dependencies	Yes	Both EL-Oracle and EL-Local were implemented using the Decentralize Py framework [11] and Python 3.83.
Experiment Setup	Yes	We deploy 96 DL nodes for each experiment, interconnected according to the evaluated topologies. When experimenting with s-regular topologies, each node maintains a fixed degree of log2(n), i.e., each node has 7 neighbors. ... The step size (γ) was tuned by running each baseline on a range of values and taking the setup with the best validation accuracy. The optimal step-sizes are γ = 0.1 for Fully connected and 8-U-Equi Static, and γ = 0.05 for the remaining algorithms and topologies over CIFAR-10. For FEMNIST, the optimal step size is γ = 0.1 for all algorithms and topologies. Table 3 provides a summary of the learning parameters, model, and dataset used in the experiments. (e.g. batch size (b)=8, training steps per round (r)=3 for CIFAR-10)