reproducibilityindex.ai

AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness

Authors: Dacheng Li, Hongyi Wang, Eric Xing, Hao Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate AMP on popular models and cluster setups from public clouds and show that AMP returns parallel strategies that match the expert-tuned strategies on typical cluster setups. On heterogeneous clusters or models with heterogeneous architectures, AMP finds strategies with 1.54 and 1.77 higher throughput than state-of-the-art model-parallel systems, respectively.
Researcher Affiliation	Collaboration	Dacheng Lic , Hongyi Wangc , Eric Xingmcp, Hao Zhangb c Carnegie Mellon University m Mohamed Bin Zayed University of Artificial Intelligence p Petuum Inc. b University of California, Berkeley
Pseudocode	Yes	Algorithm 1: Optimization procedure
Open Source Code	Yes	1Codes and experiment logs are available at https://github.com/MccRree177/AMP for reproducibility.
Open Datasets	No	The paper uses specific model architectures (GPT-2, Transgan) and evaluates training throughput, but it does not specify which public datasets (e.g., text corpora for GPT-2) were used for the training process nor provide access information for such data.
Dataset Splits	No	The paper describes model architectures and batch sizes for experiments, but it does not specify training, validation, or test dataset splits.
Hardware Specification	Yes	We conduct experiments using GPT-2 (L = 24, H = 1024) [20] on 4 AWS EC2 g4dn.12xlarge nodes with a global batch size 32. Each instance is equipped with 4 T4 GPUs with 50 Gbps PCIe connection intra-node bandwidth, and 50 Gbps inter-node bandwidth.
Software Dependencies	No	The paper mentions 'The underlying system is Deepspeed (built on top of the Megatron engine) [21] with fp16 optimization enabled.' but does not provide specific version numbers for these software components.
Experiment Setup	Yes	We conduct experiments using GPT-2 (L = 24, H = 1024) [20] on 4 AWS EC2 g4dn.12xlarge nodes with a global batch size 32.