Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Accelerating Gossip SGD with Periodic Global Averaging
Authors: Yiming Chen, Kun Yuan, Yingya Zhang, Pan Pan, Yinghui Xu, Wotao Yin
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results of large-scale training on image classification (Res Net50) and language modeling (BERT) validate our theoretical findings. |
| Researcher Affiliation | Industry | 1Alibaba Group, Hangzhou, China. |
| Pseudocode | Yes | Algorithm 1 Gossip-PGA |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | The Image Net-1k (Deng et al., 2009) dataset consists of 1,281,167 training images and 50,000 validation images in 1000 classes. |
| Dataset Splits | Yes | The Image Net-1k (Deng et al., 2009) dataset consists of 1,281,167 training images and 50,000 validation images in 1000 classes. |
| Hardware Specification | No | The paper mentions training on '256 GPUs' and '64 GPUs' but does not specify the exact GPU models (e.g., NVIDIA A100, V100) or other hardware details like CPU, memory, or specific cloud instances. |
| Software Dependencies | No | The paper mentions PyTorch in its references but does not provide specific version numbers for PyTorch or any other software libraries or dependencies used in the experiments. |
| Experiment Setup | Yes | The learning rate is warmed up in the first 5 epochs and is decayed by a factor of 10 at 30, 60 and 90 epochs. We set the period to 6 for both Local SGD and Gossip-PGA. In Gossip-AGA, the period is set to 4 initially and changed adaptively afterwards, roughly 9% iterations conduct global averaging. |