Accelerating Gossip SGD with Periodic Global Averaging
Authors: Yiming Chen, Kun Yuan, Yingya Zhang, Pan Pan, Yinghui Xu, Wotao Yin
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results of large-scale training on image classification (Res Net50) and language modeling (BERT) validate our theoretical findings. |
| Researcher Affiliation | Industry | 1Alibaba Group, Hangzhou, China. |
| Pseudocode | Yes | Algorithm 1 Gossip-PGA |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | The Image Net-1k (Deng et al., 2009) dataset consists of 1,281,167 training images and 50,000 validation images in 1000 classes. |
| Dataset Splits | Yes | The Image Net-1k (Deng et al., 2009) dataset consists of 1,281,167 training images and 50,000 validation images in 1000 classes. |
| Hardware Specification | No | The paper mentions training on '256 GPUs' and '64 GPUs' but does not specify the exact GPU models (e.g., NVIDIA A100, V100) or other hardware details like CPU, memory, or specific cloud instances. |
| Software Dependencies | No | The paper mentions PyTorch in its references but does not provide specific version numbers for PyTorch or any other software libraries or dependencies used in the experiments. |
| Experiment Setup | Yes | The learning rate is warmed up in the first 5 epochs and is decayed by a factor of 10 at 30, 60 and 90 epochs. We set the period to 6 for both Local SGD and Gossip-PGA. In Gossip-AGA, the period is set to 4 initially and changed adaptively afterwards, roughly 9% iterations conduct global averaging. |