Stability-Based Generalization Analysis of the Asynchronous Decentralized SGD
Authors: Xiaoge Deng, Tao Sun, Shengwei Li, Dongsheng Li
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct extensive experiments on MNIST, CIFAR-10, CIFAR-100, and Tiny-Image Net datasets to validate the theoretical findings. |
| Researcher Affiliation | Academia | National Lab for Parallel and Distributed Processing (PDL), College of Computer, National University of Defense Technology, Changsha, Hunan, China. dengxg@nudt.edu.cn, nudtsuntao@163.com, lucasleesw9@gmail.com, dsli@nudt.edu.cn |
| Pseudocode | No | The paper describes the AD-SGD algorithm in numbered steps within a paragraph, but it does not present a formal pseudocode block or a clearly labeled 'Algorithm' figure. |
| Open Source Code | No | The paper does not provide any statement or link regarding the public availability of source code for the described methodology. |
| Open Datasets | Yes | Finally, we conduct extensive experiments on MNIST, CIFAR-10, CIFAR-100, and Tiny-Image Net datasets to validate the theoretical findings. |
| Dataset Splits | No | The paper mentions using training and testing errors but does not explicitly specify validation dataset splits or specific percentages for any data partitioning. |
| Hardware Specification | Yes | The experiments are conducted on four physical machines with a total of 16 distributed computing workers. Each machine is equipped with four Nvidia RTX-3090 24 GB GPUs, two Intel Xeon 4214R @2.40 GHz CPUs and 128 GB DDR4 RAMs, and the machines are connected via 100 Gbps Infini Band. |
| Software Dependencies | No | The paper states 'All our experimental results are based on a Py Torch (Paszke et al. 2019) implementation of the NCCL backend.' but does not specify version numbers for PyTorch or NCCL. |
| Experiment Setup | Yes | The local training batch size is set to 256 for all experiments. We focus on exploring the role played by learning rates, asynchronous delays, and decentralized topologies. To make the results more interpretable, we avoid other training techniques such as warmup or weight decay. |