Pipeline Parallelism with Controllable Memory
Authors: Penghui Qi, Xinyi Wan, Nyamdavaa Amar, Min Lin
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluations demonstrate that in pure pipeline parallelism settings, our methods outperform 1F1B by from 7% to 55% in terms of throughput. When employing a grid search over hybrid parallelism hyperparameters in practical scenarios, our methods demonstrate a 16% throughput improvement over the 1F1B baseline for large language models. |
| Researcher Affiliation | Collaboration | Penghui Qi 12, Xinyi Wan 1, Nyamdavaa Amar 2, Min Lin1 1Sea AI Lab 2National University of Singapore |
| Pseudocode | No | No pseudocode or algorithm blocks are explicitly labeled or presented in a structured format. |
| Open Source Code | Yes | The implementation is open-sourced at this url. |
| Open Datasets | No | The paper mentions models analogous to GPT-3 and states that their implementation is based on Megatron-LM, but does not specify the dataset used or provide access information for it. Therefore, it's not possible to confirm public availability of the dataset with concrete access information. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It mentions using 'models detailed in Table 2 analogous to GPT-3' but no splitting information. |
| Hardware Specification | Yes | Our implementation is based on the open-source Megatron-LM project [Narayanan et al., 2021] and is experimented on up to 40 NVIDIA A100 SXM 80G GPUs distributed across 5 nodes interconnected by a Ro CE RDMA network. |
| Software Dependencies | No | The paper mentions its implementation is based on the open-source Megatron-LM project, but it does not specify any software components with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For each method, the best result from the grid search is reported. We present the best result for each pipeline parallel schedule in Table 3 and the corresponding parameters. |