Zero Bubble (Almost) Pipeline Parallelism

Authors: Penghui Qi, Xinyi Wan, Guangxing Huang, Min Lin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental evaluations show that our method outperforms the 1F1B schedule up to 15% in throughput under a similar memory limit.
Researcher Affiliation Industry Penghui Qi , Xinyi Wan , Guangxing Huang & Min Lin Sea AI Lab {qiph,wanxy,huanggx,linmin}@sea.com
Pseudocode Yes Algorithm 1 In-place rollback for Adam W
Open Source Code Yes The source code based on Megatron-LM is publicly avaiable at https: //github.com/sail-sg/zero-bubble-pipeline-parallelism.
Open Datasets No The paper states using 'models analogous to GPT-3' but does not specify the dataset used for training or provide any concrete access information (link, DOI, citation) for a publicly available dataset.
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits or mention the methodology for creating such splits. It only describes the models and experimental settings.
Hardware Specification Yes Our experiments utilize up to 32 NVIDIA A100 SXM 80G GPUs distributed across 4 nodes interconnected by a Ro CE RDMA network.
Software Dependencies No The paper mentions basing its implementation on the 'open-source Megatron-LM project' but does not provide specific version numbers for Megatron-LM or any other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes Table 3 details specific model configurations including Layers, Attention Heads, Hidden Size, Sequence Length, Pipelines (GPUs), Microbatch Size, and Number of Microbatches used in experiments. The text also describes how profiling measurements for TF, TB, TW, and Tcomm are collected and used by the automatic pipeline scheduling algorithm.