reproducibilityindex.ai

Efficient Algorithms for Device Placement of DNN Graph Operators

Authors: Jakub M. Tarnawski, Amar Phanishayee, Nikhil Devanur, Divya Mahajan, Fanny Nina Paravecino

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the applicability and efﬁciency of our approaches using several contemporary DNN computation graphs. We evaluate our partitioning algorithms for different scenarios described above for a variety of modern DNN workloads (7 DNNs, 16 layer and operator graphs). We ﬁnd that the placements are efﬁcient and result in non-trivial optimal splits; non-contiguous splits outperform all the techniques, with an improvement of up to 2 over expert (average 1.46 ), 2.08 over local search (average 1.29 ) [MKA07], 1.21 over Pipe Dream (average 1.10 ) [NHP+19], 7.69 over Scotch (average 1.50 ) [Pel09].
Researcher Affiliation	Industry	Jakub Tarnawski Microsoft Research Amar Phanishayee Microsoft Research Nikhil Devanur Amazon Divya Mahajan Microsoft Fanny Nina Paravecino Microsoft
Pseudocode	No	The paper describes algorithms using mathematical formulations and prose, but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code and workloads used for evaluations are available at https://github.com/msr-fiddle/dnn-partitioning.
Open Datasets	No	The paper mentions using 'BERT', 'Res Net50', 'Inception-v3', and 'GNMT' as DNN models/workloads, but does not provide concrete access information (links, citations with author/year, or repository names) for the specific datasets used for their evaluation.
Dataset Splits	No	The paper discusses splitting DNN models across accelerators for parallelism, but it does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology) for reproducing the data partitioning.
Hardware Specification	Yes	The DNN workloads are split across 6 accelerators of the same type (GPU for layer graphs, a hardware accelerator representing TPUs or FPGAs for operator graphs). We use 3 accelerators in case of the smaller BERT-3 and BERT-6 models. Each accelerator has 16 GB of DRAM and is connected to the CPU over a PCIE 3.0 interconnect.
Software Dependencies	No	The paper mentions using a 'commercial-grade solver [GO19]' (Gurobi optimizer), but does not provide specific version numbers for this or any other software component used in the experiments.
Experiment Setup	Yes	More details about our experimental setup, graph topology, and implementations can be found in Appendix E.