AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness

Authors: Dacheng Li, Hongyi Wang, Eric Xing, Hao Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate AMP on popular models and cluster setups from public clouds and show that AMP returns parallel strategies that match the expert-tuned strategies on typical cluster setups. On heterogeneous clusters or models with heterogeneous architectures, AMP finds strategies with 1.54 and 1.77 higher throughput than state-of-the-art model-parallel systems, respectively.
Researcher Affiliation Collaboration Dacheng Lic , Hongyi Wangc , Eric Xingmcp, Hao Zhangb c Carnegie Mellon University m Mohamed Bin Zayed University of Artificial Intelligence p Petuum Inc. b University of California, Berkeley
Pseudocode Yes Algorithm 1: Optimization procedure
Open Source Code Yes 1Codes and experiment logs are available at https://github.com/MccRree177/AMP for reproducibility.
Open Datasets No The paper uses specific model architectures (GPT-2, Transgan) and evaluates training throughput, but it does not specify which public datasets (e.g., text corpora for GPT-2) were used for the training process nor provide access information for such data.
Dataset Splits No The paper describes model architectures and batch sizes for experiments, but it does not specify training, validation, or test dataset splits.
Hardware Specification Yes We conduct experiments using GPT-2 (L = 24, H = 1024) [20] on 4 AWS EC2 g4dn.12xlarge nodes with a global batch size 32. Each instance is equipped with 4 T4 GPUs with 50 Gbps PCIe connection intra-node bandwidth, and 50 Gbps inter-node bandwidth.
Software Dependencies No The paper mentions 'The underlying system is Deepspeed (built on top of the Megatron engine) [21] with fp16 optimization enabled.' but does not provide specific version numbers for these software components.
Experiment Setup Yes We conduct experiments using GPT-2 (L = 24, H = 1024) [20] on 4 AWS EC2 g4dn.12xlarge nodes with a global batch size 32.