Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Group-based Interleaved Pipeline Parallelism for Large-scale DNN Training
Authors: PengCheng Yang, Xiaoming Zhang, Wenpeng Zhang, Ming Yang, Hong Wei
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments, which train large BERT language models, show that compared to Pipe Dream-2BW, WPipe achieves 1.4 acceleration and reduces the memory footprint by 36%, without nearly sacrificing any final model accuracy. |
| Researcher Affiliation | Industry | Pengcheng Yang, Xiaoming Zhang, Wenpeng Zhang, Ming Yang, Hong Wei Ant Group, China EMAIL, EMAIL EMAIL, EMAIL EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper mentions using PyTorch, transformers, and apex, which are open-source, but it does not provide a link to its own implementation code for WPipe. |
| Open Datasets | Yes | We finetuned BERTBASE (Devlin et al., 2018) and BERTLARGE (Devlin et al., 2018) for WPipe, Pipe Dream-2BW, and data parallelism on the QQP and MNLI tasks (Wang et al., 2018). ... We finetuned, respectively, the Res Ne Xt50 (32x4d) (Xie et al., 2017) and Res Ne Xt101 (32x8d) (Xie et al., 2017) for WPipe, Pipe Dream-2BW, and data parallelism on the three datasets of CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), and Oxford 102 Flowers (Nilsback & Zisserman, 2008). |
| Dataset Splits | No | The paper mentions using several standard datasets for training and evaluation, but it does not explicitly describe the training, validation, and test splits (e.g., percentages, sample counts, or references to predefined splits with specific citations of how they were applied in this work). |
| Hardware Specification | Yes | WPipe is implemented with Py Torch-1.4 (Edward Z. Yang, 2021) and executes on two environments, i.e., a single machine with eight 16-GB V100 GPUs (Env-1) and a private cluster with 8 8V100 GPUs (Env-2). ... there are 8 machines in our private cluster, and each machine has 8 GPUs with a memory size of 16G, Intel(R)Xeon(R) Platinum 8163 CPU, 512GB of RAM with a 25Gbps Ethernet interface, and 300GBps NVLink (nvl) |
| Software Dependencies | Yes | WPipe is implemented with Py Torch-1.4 (Edward Z. Yang, 2021)... We used respectively bert-base-uncase and bert-large-uncase pre-training weights from transformers-3.5.0 (Wolf et al., 2020). |
| Experiment Setup | Yes | We used Adam optimizer, a learning rate of 8 10 5(ν = 8 10 5) with 1000 steps warmup(ws = 1000) and a mini-batch size of 256(b = 256) for BERTBASE and the same optimizer, ν = 4 10 5 with ws = 2000 and b = 128 for BERTLARGE. ... We used the pre-training weights from the torchvision (Francisco Massa, 2021), SDG optimizer, ν = 1 10 2 with 0.05 warmup ratio and b = 256. |