Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters
Authors: WenZheng Zhang, Yang Hu, Jing Shi, Xiaoying Bai
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on three heterogeneous clusters, comprising six different types of GPUs, demonstrate that Poplar achieves a training throughput improvement of 1.02 3.92x over current state-of-the-art heterogeneous training systems. |
| Researcher Affiliation | Academia | 1School of Computer Science, Peking University 2Center for Information Research, Academy of Military Sciences 3Advanced Institute of Big Data EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Heterogeneity Aware of each GPU |
| Open Source Code | No | We will publish all source codes of this work on Github for further research explorations. |
| Open Datasets | Yes | All experiments are evaluated on wikitext2-v1 dataset(Merity et al. 2016). |
| Dataset Splits | No | All experiments are evaluated on wikitext2-v1 dataset(Merity et al. 2016). |
| Hardware Specification | Yes | Our experiments are conducted on three heterogeneous GPU clusters, each cluster contains two types of GPUs, as shown in Table 1. ... A100 80GB A100 40GB ... V100 16GB T4 16GB ... A800 80GB V100S 32GB |
| Software Dependencies | No | We have implemented our work on Py Torch with around 2000+ lines of code. |
| Experiment Setup | Yes | We maintain a global batch size of 2 million tokens throughout our experiments. |