Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters

Authors: WenZheng Zhang, Yang Hu, Jing Shi, Xiaoying Bai

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on three heterogeneous clusters, comprising six different types of GPUs, demonstrate that Poplar achieves a training throughput improvement of 1.02 3.92x over current state-of-the-art heterogeneous training systems.
Researcher Affiliation	Academia	1School of Computer Science, Peking University 2Center for Information Research, Academy of Military Sciences 3Advanced Institute of Big Data EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Heterogeneity Aware of each GPU
Open Source Code	No	We will publish all source codes of this work on Github for further research explorations.
Open Datasets	Yes	All experiments are evaluated on wikitext2-v1 dataset(Merity et al. 2016).
Dataset Splits	No	All experiments are evaluated on wikitext2-v1 dataset(Merity et al. 2016).
Hardware Specification	Yes	Our experiments are conducted on three heterogeneous GPU clusters, each cluster contains two types of GPUs, as shown in Table 1. ... A100 80GB A100 40GB ... V100 16GB T4 16GB ... A800 80GB V100S 32GB
Software Dependencies	No	We have implemented our work on Py Torch with around 2000+ lines of code.
Experiment Setup	Yes	We maintain a global batch size of 2 million tokens throughout our experiments.