Pipeline Parallelism with Controllable Memory

Authors: Penghui Qi, Xinyi Wan, Nyamdavaa Amar, Min Lin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluations demonstrate that in pure pipeline parallelism settings, our methods outperform 1F1B by from 7% to 55% in terms of throughput. When employing a grid search over hybrid parallelism hyperparameters in practical scenarios, our methods demonstrate a 16% throughput improvement over the 1F1B baseline for large language models.
Researcher Affiliation Collaboration Penghui Qi 12, Xinyi Wan 1, Nyamdavaa Amar 2, Min Lin1 1Sea AI Lab 2National University of Singapore
Pseudocode No No pseudocode or algorithm blocks are explicitly labeled or presented in a structured format.
Open Source Code Yes The implementation is open-sourced at this url.
Open Datasets No The paper mentions models analogous to GPT-3 and states that their implementation is based on Megatron-LM, but does not specify the dataset used or provide access information for it. Therefore, it's not possible to confirm public availability of the dataset with concrete access information.
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits. It mentions using 'models detailed in Table 2 analogous to GPT-3' but no splitting information.
Hardware Specification Yes Our implementation is based on the open-source Megatron-LM project [Narayanan et al., 2021] and is experimented on up to 40 NVIDIA A100 SXM 80G GPUs distributed across 5 nodes interconnected by a Ro CE RDMA network.
Software Dependencies No The paper mentions its implementation is based on the open-source Megatron-LM project, but it does not specify any software components with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes For each method, the best result from the grid search is reported. We present the best result for each pipeline parallel schedule in Table 3 and the corresponding parameters.