Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Asynchronous Decentralized Parallel Stochastic Gradient Descent
Authors: Xiangru Lian, Wei Zhang, Ce Zhang, Ji Liu
ICML 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, AD-PSGD outperforms the best of decentralized parallel SGD (D-PSGD), asynchronous parallel SGD (APSGD), and standard data parallel SGD (All Reduce SGD), often by orders of magnitude in a heterogeneous environment. When training Res Net-50 on Image Net with up to 128 GPUs, AD-PSGD converges (w.r.t epochs) similarly to the All Reduce-SGD, but each epoch can be up to 4-8 faster than its synchronous counterparts in a network-sharing HPC environment. 5 Experiments |
| Researcher Affiliation | Collaboration | Xiangru Lian 1 * Wei Zhang 2 * Ce Zhang 3 Ji Liu 4 1Department of Computer Science, University of Rochester 2IBM T. J. Watson Research Center 3Department of Computer Science, ETH Zurich 4Tencent AI lab, Seattle, USA. |
| Pseudocode | Yes | Algorithm 1 AD-PSGD (logical view) |
| Open Source Code | No | The paper does not provide an explicit statement about releasing the source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We use CIFAR10 and Image Net-1K as the evaluation dataset and we use Torch-7 as our deep learning framework. |
| Dataset Splits | No | The paper mentions using CIFAR10 and Image Net-1K datasets, but does not explicitly provide the training/test/validation dataset splits, percentages, or refer to predefined splits with citations. |
| Hardware Specification | Yes | IBM S822LC HPC cluster: Each node with 4 Nvidia P100 GPUs, 160 Power8 cores (8-way SMT) and 500GB memory on each node. 100Gbit/s Mellanox EDR infiniband network. We use 32 such nodes. x86-based cluster: This cluster is a cloud-like environment with 10Gbit/s ethernet connection. Each node has 4 Nvidia P100 GPUs, 56 Xeon E5-2680 cores (2-way SMT), and 1TB DRAM. We use 4 such nodes. |
| Software Dependencies | No | The paper mentions 'Torch-7 as our deep learning framework' and 'MPI to implement the communication scheme', but it does not specify version numbers for these software components. |
| Experiment Setup | Yes | Batch size: 128 per worker for VGG, 32 for Res Net-20. Learning rate: For VGG start from 1 and reduce by half every 25 epochs. For Res Net-20 start from 0.1 and decay by a factor of 10 at the 81st epoch and the 122nd epoch. Momentum: 0.9. Weight decay: 10−4. |