AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness
Authors: Dacheng Li, Hongyi Wang, Eric Xing, Hao Zhang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate AMP on popular models and cluster setups from public clouds and show that AMP returns parallel strategies that match the expert-tuned strategies on typical cluster setups. On heterogeneous clusters or models with heterogeneous architectures, AMP finds strategies with 1.54 and 1.77 higher throughput than state-of-the-art model-parallel systems, respectively. |
| Researcher Affiliation | Collaboration | Dacheng Lic , Hongyi Wangc , Eric Xingmcp, Hao Zhangb c Carnegie Mellon University m Mohamed Bin Zayed University of Artificial Intelligence p Petuum Inc. b University of California, Berkeley |
| Pseudocode | Yes | Algorithm 1: Optimization procedure |
| Open Source Code | Yes | 1Codes and experiment logs are available at https://github.com/MccRree177/AMP for reproducibility. |
| Open Datasets | No | The paper uses specific model architectures (GPT-2, Transgan) and evaluates training throughput, but it does not specify which public datasets (e.g., text corpora for GPT-2) were used for the training process nor provide access information for such data. |
| Dataset Splits | No | The paper describes model architectures and batch sizes for experiments, but it does not specify training, validation, or test dataset splits. |
| Hardware Specification | Yes | We conduct experiments using GPT-2 (L = 24, H = 1024) [20] on 4 AWS EC2 g4dn.12xlarge nodes with a global batch size 32. Each instance is equipped with 4 T4 GPUs with 50 Gbps PCIe connection intra-node bandwidth, and 50 Gbps inter-node bandwidth. |
| Software Dependencies | No | The paper mentions 'The underlying system is Deepspeed (built on top of the Megatron engine) [21] with fp16 optimization enabled.' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We conduct experiments using GPT-2 (L = 24, H = 1024) [20] on 4 AWS EC2 g4dn.12xlarge nodes with a global batch size 32. |