Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Pfeife: Automatic Pipeline Parallelism for PyTorch

Authors: Ho Young Jhoo, Chung-Kil Hur, Nuno P. Lopes

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Pfeife in three ways: (1) applicability of the approach, (2) accuracy of cost estimations, and (3) end-to-end performance comparison with existing frameworks. ... Table 1: Throughput comparison of pipeline parallelism (item/s).
Researcher Affiliation	Collaboration	1Seoul National University, Republic of Korea 2INESC-ID / Instituto Superior T ecnico University of Lisbon, Portugal 3Furiosa AI, Republic of Korea.
Pseudocode	Yes	Algorithm 1 shows the pseudo-code. ... Algorithm 1 Graph-schedule co-optimization.
Open Source Code	Yes	Pfeife1 Available at https://github.com/Mer HS/pfeife.
Open Datasets	Yes	We used Torch Bench (Hao et al., 2023), which is the official Py Torch benchmark suite. It includes a wide range of models. ... Vision Transformer (Vi T-g/14) (Zhai et al., 2022) and GPT2-large (Radford et al., 2019) ... Llama2-7B) (Touvron et al., 2023), and a diffusion model (Stable Diffusion-XL) (Podell et al., 2023)
Dataset Splits	No	The paper refers to using datasets like Torch Bench, Vi T-g/14, Llama2-7B, and Stable Diffusion-XL. However, it does not explicitly provide details on how these datasets were split into training, validation, or test sets for the experiments conducted in this paper. It mentions "mini-batch size" and "total batch count" but not data partitioning for evaluation.
Hardware Specification	Yes	For coverage and correctness, we used a small server with 8x NVIDIA RTX 3090 24 Gi B GPUs with 4 NVLink connections. For the end-to-end experiments, we used a larger server with 8x A100 40GB GPUs with NVSwitch.
Software Dependencies	Yes	ML models are written in plain Py Torch. They are then compiled using Py Torch 2 s torch.compile (Ansel et al., 2024), as it is now common.
Experiment Setup	Yes	Listing 1 shows an example of the full code required to train a model with Pfeife... optimizer = torch.optim.Adam(main_model.parameters(), lr=1e-5) criterion = torch.nn.Cross Entropy Loss() ... (B) Total batch count: Number of mini-batches (Nl) Loop count: How many times the forward loop is executed. (Bl) Loop batch count: How many mini-batches go through the forward pass of a single stage. ( Bf) Prefetch batch count: A list with the number of forward passes each device runs in addition to Bl before it runs its first backward pass. \| Bf\| = \|D\|.