Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Pfeife: Automatic Pipeline Parallelism for PyTorch
Authors: Ho Young Jhoo, Chung-Kil Hur, Nuno P. Lopes
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Pfeife in three ways: (1) applicability of the approach, (2) accuracy of cost estimations, and (3) end-to-end performance comparison with existing frameworks. ... Table 1: Throughput comparison of pipeline parallelism (item/s). |
| Researcher Affiliation | Collaboration | 1Seoul National University, Republic of Korea 2INESC-ID / Instituto Superior T ecnico University of Lisbon, Portugal 3Furiosa AI, Republic of Korea. |
| Pseudocode | Yes | Algorithm 1 shows the pseudo-code. ... Algorithm 1 Graph-schedule co-optimization. |
| Open Source Code | Yes | Pfeife1 Available at https://github.com/Mer HS/pfeife. |
| Open Datasets | Yes | We used Torch Bench (Hao et al., 2023), which is the official Py Torch benchmark suite. It includes a wide range of models. ... Vision Transformer (Vi T-g/14) (Zhai et al., 2022) and GPT2-large (Radford et al., 2019) ... Llama2-7B) (Touvron et al., 2023), and a diffusion model (Stable Diffusion-XL) (Podell et al., 2023) |
| Dataset Splits | No | The paper refers to using datasets like Torch Bench, Vi T-g/14, Llama2-7B, and Stable Diffusion-XL. However, it does not explicitly provide details on how these datasets were split into training, validation, or test sets for the experiments conducted in this paper. It mentions "mini-batch size" and "total batch count" but not data partitioning for evaluation. |
| Hardware Specification | Yes | For coverage and correctness, we used a small server with 8x NVIDIA RTX 3090 24 Gi B GPUs with 4 NVLink connections. For the end-to-end experiments, we used a larger server with 8x A100 40GB GPUs with NVSwitch. |
| Software Dependencies | Yes | ML models are written in plain Py Torch. They are then compiled using Py Torch 2 s torch.compile (Ansel et al., 2024), as it is now common. |
| Experiment Setup | Yes | Listing 1 shows an example of the full code required to train a model with Pfeife... optimizer = torch.optim.Adam(main_model.parameters(), lr=1e-5) criterion = torch.nn.Cross Entropy Loss() ... (B) Total batch count: Number of mini-batches (Nl) Loop count: How many times the forward loop is executed. (Bl) Loop batch count: How many mini-batches go through the forward pass of a single stage. ( Bf) Prefetch batch count: A list with the number of forward passes each device runs in addition to Bl before it runs its first backward pass. | Bf| = |D|. |