Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
$\textbf{A}^2\textbf{CiD}^2$: Accelerating Asynchronous Communication in Decentralized Deep Learning
Authors: Adel Nabli, Eugene Belilovsky, Edouard Oyallon
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Numerical Experiments Now, we experimentally compare A2Ci D2 to a synchronous baseline All-Reduce SGD (AR-SGD, see [26]) and an asynchronous baseline using randomized pairwise communications (a variant of AD-PSGD [28], traditionally used in state-of-the-art decentralized asynchronous training of DNNs). |
| Researcher Affiliation | Academia | Adel Nabli Concordia University, Mila Sorbonne University, ISIR, CNRS EMAIL Eugene Belilovsky Concordia University, Mila Edouard Oyallon Sorbonne University, ISIR, CNRS |
| Pseudocode | Yes | Algorithm 1: This algorithm block describes our implementation of our Asynchronous algorithm with A2Ci D2on each local machine. |
| Open Source Code | Yes | Our code is implemented in Pytorch [35], remove locks put on previous asynchronous implementations by circumventing their deadlocks, and can be found in an open-source repository: https://github.com/Adel Nabli/ACi D. |
| Open Datasets | Yes | Following [2], we pick a Res Net18 for CIFAR-10 [24] and Res Net50 for Image Net [11]. |
| Dataset Splits | No | The paper mentions using CIFAR-10 and ImageNet and states that for the asynchronous setting, they give 'access to the whole dataset to all workers, each one shuffling it with a different random seed,' rather than splitting it. It does not provide explicit percentages or sample counts for training, validation, or test splits. |
| Hardware Specification | Yes | In particular, we show consistent improvement on the Image Net dataset using up to 64 asynchronous workers (A100 GPUs) and various communication network topologies. |
| Software Dependencies | No | The paper mentions 'Pytorch [35]' as the implementation framework but does not specify a version number for Pytorch or any other software dependency. |
| Experiment Setup | Yes | We fixed the local batch size to 128 on both CIFAR-10 and Image Net. We use SGD with a base learning rate of 0.1, a momentum value set at 0.9 and 5 10 4 for weight decay. As advocated in [16], we do not apply weight decay on the learnable batch-norm coefficients. For Image Net training with the SGD baseline, we decay the learning-rate by a factor of 10 at epochs 30, 60, 80 (epochs 50, 75 for CIFAR-10), and apply an analogous decay schedule with our asynchronous decentralized methods. |