Zero Bubble (Almost) Pipeline Parallelism
Authors: Penghui Qi, Xinyi Wan, Guangxing Huang, Min Lin
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental evaluations show that our method outperforms the 1F1B schedule up to 15% in throughput under a similar memory limit. |
| Researcher Affiliation | Industry | Penghui Qi , Xinyi Wan , Guangxing Huang & Min Lin Sea AI Lab {qiph,wanxy,huanggx,linmin}@sea.com |
| Pseudocode | Yes | Algorithm 1 In-place rollback for Adam W |
| Open Source Code | Yes | The source code based on Megatron-LM is publicly avaiable at https: //github.com/sail-sg/zero-bubble-pipeline-parallelism. |
| Open Datasets | No | The paper states using 'models analogous to GPT-3' but does not specify the dataset used for training or provide any concrete access information (link, DOI, citation) for a publicly available dataset. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits or mention the methodology for creating such splits. It only describes the models and experimental settings. |
| Hardware Specification | Yes | Our experiments utilize up to 32 NVIDIA A100 SXM 80G GPUs distributed across 4 nodes interconnected by a Ro CE RDMA network. |
| Software Dependencies | No | The paper mentions basing its implementation on the 'open-source Megatron-LM project' but does not provide specific version numbers for Megatron-LM or any other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Table 3 details specific model configurations including Layers, Attention Heads, Hidden Size, Sequence Length, Pipelines (GPUs), Microbatch Size, and Number of Microbatches used in experiments. The text also describes how profiling measurements for TF, TB, TW, and Tcomm are collected and used by the automatic pipeline scheduling algorithm. |