Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SAPipe: Staleness-Aware Pipeline for Data Parallel DNN Training

Authors: Yangrui Chen, Cong Xie, Meng Ma, Juncheng Gu, Yanghua Peng, Haibin Lin, Chuan Wu, Yibo Zhu

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that SAPipe achieves up to 157% speedups over Byte PS (non-stale), and outperforms Pipe SGD in accuracy by up to 13.7%.
Researcher Affiliation	Collaboration	Yangrui Chen The University of Hong Kong EMAIL Cong Xie Byte Dance EMAIL Meng Ma Byte Dance EMAIL Juncheng Gu Byte Dance EMAIL Yanghua Peng Byte Dance EMAIL Haibin Lin Byte Dance EMAIL Chuan Wu The University of Hong Kong EMAIL Yibo Zhu Byte Dance EMAIL
Pseudocode	Yes	Algorithm 1 Distributed Training / Staleness Training Pipeline (Pipe SGD), Algorithm 2 Staleness-Aware Pipeline with Delay Compensation (SAPipe-DC), Algorithm 3 Staleness-Aware Pipeline with Weight Prediction (SAPipe-WP)
Open Source Code	Yes	Code: https://github.com/Chen Aris/sapipe.git
Open Datasets	Yes	We train CV models on two datasets: (i) CIFAR-10 [16] and (ii) Image Net [17]. We fine-tune the pretrained GPT-2 model on (iii) Wiki Text-2 language modeling dataset [20]. The Transformer model is trained on (iv) Multi30K [8] for WMT16 English-to-German Multimodal Translation task.
Dataset Splits	Yes	We train CV models on two datasets: (i) CIFAR-10 [16] and (ii) Image Net [17]. We fine-tune the pretrained GPT-2 model on (iii) Wiki Text-2 language modeling dataset [20]. The Transformer model is trained on (iv) Multi30K [8] for WMT16 English-to-German Multimodal Translation task.
Hardware Specification	Yes	We evaluate SAPipe 2 on 8 physical machines, each equipped with 90 CPU cores, 320GB memory, 8 Tesla V100 GPUs with NVLinks, and 100Gbps bandwidth between any two machines.
Software Dependencies	No	The paper mentions 'Byte PS framework, compatible to both Tensor Flow and Py Torch' and 'All baselines and SAPipe are run on Py Torch computation framework.' However, it does not provide specific version numbers for these software components.
Experiment Setup	Yes	The batch sizes per GPU are 128 images, 128 images, 80 tokens and 3200 tokens, respectively. We adopt SGD optimizer with 0.9 Polyak s momentum [24] and 5e-5 weight decay when training VGG16 and Res Net50 models, and Adam [14] optimizer with (0.9, 0.98) betas for NLP models. The global learning rates for VGG16, Res Net50 and GPT-2 are 0.1, 0.1, and 5e-5, respectively... SAPipe uses Option 3 in Algorithm 3 as the default staleness compensation method, with λ empirically set as 0.2.