PyTorch: An Imperative Style, High-Performance Deep Learning Library

Authors: Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, Soumith Chintala

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 Evaluation In this section we compare the performance of Py Torch with several other commonly-used deep learning libraries, and find that it achieves competitive performance across a range of tasks. All experiments were performed on a workstation with two Intel Xeon E5-2698 v4 CPUs and one NVIDIA Quadro GP100 GPU. 6.1 Asynchronous dataflow We start by quantifying the ability of Py Torch to asynchronously execute dataflow on GPU. We use the built-in profiler [44] to instrument various benchmarks and record a timeline of the execution of a single training step. 6.2 Memory management We used the NVIDIA profiler to trace the execution of the CUDA runtime as well as the execution of the CUDA kernels launched during one training iteration of the Res Net-50 model. 6.3 Benchmarks Finally, we can get an overall sense of single-machine eager mode performance of Py Torch by comparing it to three popular graph-based deep learning frameworks (CNTK, MXNet and Tensor Flow), a define-by-run framework (Chainer), and production oriented platform (Paddle Paddle). Our results are summarized in Table 1.
Researcher Affiliation Collaboration Adam Paszke University of Warsaw adam.paszke@gmail.com Sam Gross Facebook AI Research sgross@fb.com Francisco Massa Facebook AI Research fmassa@fb.com Adam Lerer Facebook AI Research alerer@fb.com James Bradbury Google jekbradbury@gmail.com Gregory Chanan Facebook AI Research gchanan@fb.com Trevor Killeen Self Employed killeent@cs.washington.edu Zeming Lin Facebook AI Research zlin@fb.com Natalia Gimelshein NVIDIA ngimelshein@nvidia.com Luca Antiga Orobix luca.antiga@orobix.com Alban Desmaison Oxford University alban@robots.ox.ac.uk Andreas Köpf Xamla andreas.koepf@xamla.com Edward Yang Facebook AI Research ezyang@fb.com Zach De Vito Facebook AI Research zdevito@cs.stanford.edu Martin Raison Nabla martinraison@gmail.com Alykhan Tejani Twitter atejani@twitter.com Sasank Chilamkurthy Qure.ai sasankchilamkurthy@gmail.com Benoit Steiner Facebook AI Research benoitsteiner@fb.com Lu Fang Facebook lufang@fb.com Junjie Bai Facebook jbai@fb.com Soumith Chintala Facebook AI Research soumith@gmail.com
Pseudocode Yes Listing 1: A custom layer used as a building block for a simple but complete neural network. (...) Listing 2: Simplified training of a generative adversarial networks.
Open Source Code Yes This paper introduces Py Torch, a Python library that performs immediate execution of dynamic tensor computations with automatic differentiation and GPU acceleration, and does so while maintaining performance comparable to the fastest current libraries for deep learning.
Open Datasets Yes Table 1: Training speed for 6 models using 32bit floats. Throughput is measured in images per second for the Alex Net, VGG-19, Res Net-50, and Mobile Net models, in tokens per second for the GNMTv2 model, and in samples per second for the NCF model. (...) The Appendix details all the steps needed to reproduce our setup.
Dataset Splits Yes The Appendix details all the steps needed to reproduce our setup.
Hardware Specification Yes All experiments were performed on a workstation with two Intel Xeon E5-2698 v4 CPUs and one NVIDIA Quadro GP100 GPU.
Software Dependencies Yes The Py Torch team. Pytorch Autograd Profiler. https://pytorch.org/docs/1.0.1/autograd.html#profiler.
Experiment Setup Yes The Appendix details all the steps needed to reproduce our setup.