reproducibilityindex.ai

Stability-Based Generalization Analysis of the Asynchronous Decentralized SGD

Authors: Xiaoge Deng, Tao Sun, Shengwei Li, Dongsheng Li

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we conduct extensive experiments on MNIST, CIFAR-10, CIFAR-100, and Tiny-Image Net datasets to validate the theoretical ﬁndings.
Researcher Affiliation	Academia	National Lab for Parallel and Distributed Processing (PDL), College of Computer, National University of Defense Technology, Changsha, Hunan, China. dengxg@nudt.edu.cn, nudtsuntao@163.com, lucasleesw9@gmail.com, dsli@nudt.edu.cn
Pseudocode	No	The paper describes the AD-SGD algorithm in numbered steps within a paragraph, but it does not present a formal pseudocode block or a clearly labeled 'Algorithm' figure.
Open Source Code	No	The paper does not provide any statement or link regarding the public availability of source code for the described methodology.
Open Datasets	Yes	Finally, we conduct extensive experiments on MNIST, CIFAR-10, CIFAR-100, and Tiny-Image Net datasets to validate the theoretical ﬁndings.
Dataset Splits	No	The paper mentions using training and testing errors but does not explicitly specify validation dataset splits or specific percentages for any data partitioning.
Hardware Specification	Yes	The experiments are conducted on four physical machines with a total of 16 distributed computing workers. Each machine is equipped with four Nvidia RTX-3090 24 GB GPUs, two Intel Xeon 4214R @2.40 GHz CPUs and 128 GB DDR4 RAMs, and the machines are connected via 100 Gbps Inﬁni Band.
Software Dependencies	No	The paper states 'All our experimental results are based on a Py Torch (Paszke et al. 2019) implementation of the NCCL backend.' but does not specify version numbers for PyTorch or NCCL.
Experiment Setup	Yes	The local training batch size is set to 256 for all experiments. We focus on exploring the role played by learning rates, asynchronous delays, and decentralized topologies. To make the results more interpretable, we avoid other training techniques such as warmup or weight decay.