Practical Performance Guarantees for Pipelined DNN Inference

Authors: Aaron Archer, Matthew Fahrbach, Kuikui Liu, Prakash Prabhu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Applying these methods to a diverse testbed of 369 production models, for k {2, 4, 8, 16, 32, 64}, we empirically show that these lower bounds are strong enough to be useful in practice.
Researcher Affiliation Collaboration 1Google 2MIT. Correspondence to: Matthew Fahrbach <fahrbach@google.com>.
Pseudocode Yes Algorithm 1 Optimal MTPP slicing of topological order π into at most k blocks.
Open Source Code No The paper does not include any explicit statements about releasing the code for the described methodology, nor does it provide a direct link to a source-code repository.
Open Datasets Yes We also run the same experiment on 1000 publicly available synthetic model graphs from REGAL (Paliwal et al., 2020).
Dataset Splits No The paper does not specify training, validation, or test dataset splits. The experiments are conducted on a collection of computation graphs, which are models themselves, not datasets that are typically split for model training and evaluation.
Hardware Specification Yes Each instance is run on a heterogeneous cluster containing, e.g., Intel Xeon Platinum 8173M @ 2.00GHz processors, and the best lower bound proven in a fixed time limit is reported.
Software Dependencies Yes We solve the MIPs using a combination of Gurobi v9.0.2 (Gurobi Optimization, LLC, 2023) and SCIP v7.0.1 (Bestuzheva et al., 2021).
Experiment Setup Yes BRKGA We run BRKGA with Brkga Sort And Slice Decoder (Algorithm 2), and we set the population to size 100 and the number of generations to 100, for 104 total candidate evaluations, which we denote as brkga-10000 in Table 2. brkga-100 sets the population size to 10 and the number of generations to 10, for 100 total candidate evaluations.