Topology-aware Generalization of Decentralized SGD

Authors: Tongtian Zhu, Fengxiang He, Lan Zhang, Zhengyang Niu, Mingli Song, Dacheng Tao

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments of VGG-11 and Res Net-18 on CIFAR-10, CIFAR-100 and Tiny-Image Net justify our theory.
Researcher Affiliation Collaboration 1College of Computer Science and Technology, Zhejiang University 2Shanghai Institute for Advanced Study of Zhejiang University 3JD Explore Academy, JD.com Inc. 4School of Computer Science and Technology, University of Science and Technology of China 5Institute of Artificial Intelligence, Hefei Comprehensive National Science Center 6School of Computer Science, Wuhan University 7Zhejiang University City College.
Pseudocode No The paper describes an update rule in Equation (1) but does not provide a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Code is available at https://github.com/Raiden-Zhu/ Generalization-of-DSGD.
Open Datasets Yes The models are trained on CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009) and Tiny Image Net (Le & Yang, 2015), three popular benchmark image classification datasets.
Dataset Splits Yes The CIFAR-10 dataset consists of 60,000 32 32 color images across 10 classes, with each class containing 5,000 training and 1,000 testing images. The CIFAR-100 dataset also consists of 60,000 32 32 color images, except that it has 100 classes, each class containing 5,00 training and 1,00 testing images. Tiny Image Net contains 120,000 64 64 color images in 200 classes, each class containing 500 training images, 50 validation images, and 50 test images.
Hardware Specification Yes All our experiments are conducted on a computing cluster with GPUs of NVIDIA Tesla V100 16GB and CPUs of Intel Xeon Gold 6140 CPU @ 2.30GHz.
Software Dependencies No The paper mentions 'Py Torch (Paszke et al., 2019)' but does not specify its version number, nor other software dependencies with versions.
Experiment Setup Yes The local batch size is set as 64. The initial learning rate is set as 0.1 and will be divided by 10 when the model has accessed 2/5 and 4/5 of the total number of iterations.