Accelerating Gossip SGD with Periodic Global Averaging

Authors: Yiming Chen, Kun Yuan, Yingya Zhang, Pan Pan, Yinghui Xu, Wotao Yin

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results of large-scale training on image classification (Res Net50) and language modeling (BERT) validate our theoretical findings.
Researcher Affiliation Industry 1Alibaba Group, Hangzhou, China.
Pseudocode Yes Algorithm 1 Gossip-PGA
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets Yes The Image Net-1k (Deng et al., 2009) dataset consists of 1,281,167 training images and 50,000 validation images in 1000 classes.
Dataset Splits Yes The Image Net-1k (Deng et al., 2009) dataset consists of 1,281,167 training images and 50,000 validation images in 1000 classes.
Hardware Specification No The paper mentions training on '256 GPUs' and '64 GPUs' but does not specify the exact GPU models (e.g., NVIDIA A100, V100) or other hardware details like CPU, memory, or specific cloud instances.
Software Dependencies No The paper mentions PyTorch in its references but does not provide specific version numbers for PyTorch or any other software libraries or dependencies used in the experiments.
Experiment Setup Yes The learning rate is warmed up in the first 5 epochs and is decayed by a factor of 10 at 30, 60 and 90 epochs. We set the period to 6 for both Local SGD and Gossip-PGA. In Gossip-AGA, the period is set to 4 initially and changed adaptively afterwards, roughly 9% iterations conduct global averaging.