Efficient Combination of Rematerialization and Offloading for Training DNNs

Authors: Olivier Beaumont, Lionel Eyraud-Dubois, Alena Shilova

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We measured running times and memory occupation of several networks from Py Torch torchvision package: resnet, densenet and inception, with a batch size of 16 and images of 500 500 pixels. The simulation results presented here were obtained using 4 cores of a 24-core Haswell Intel R Xeon R E5-2680 v3 at 2,5 GHz, with 128GB of memory, and used about one hour of computation.
Researcher Affiliation Academia Inria Bordeaux {olivier.beaumont, lionel.eyraud-dubois, alena.shilova}@inria.fr
Pseudocode No To save space, we will not detail in the main paper all the equations of the dynamic program, which involves a large number of cases. We will focus in the main part of the paper on the intuitions and the general working principle of the dynamic program and refer the reader to Appendix B for detailed derivations and proofs.
Open Source Code Yes We have implemented a preliminary version of our best performing algorithms (pofo and autocapper) and made them available in rotor [1]. Rotor. https://gitlab.inria.fr/hiepacs/rotor, 2019.
Open Datasets Yes We measured running times and memory occupation of several networks from Py Torch torchvision package: resnet, densenet and inception, with a batch size of 16 and images of 500 500 pixels.
Dataset Splits No The authors marked 'N/A' for specifying all training details including data splits in their reproducibility checklist.
Hardware Specification Yes Time measurements were performed on a NVidia Tesla V100 GPU. We also measured the bandwidth obtained when transferring Py Torch tensors from and to the GPU and obtained 12GB/s. The simulation results presented here were obtained using 4 cores of a 24-core Haswell Intel R Xeon R E5-2680 v3 at 2,5 GHz, with 128GB of memory, and used about one hour of computation.
Software Dependencies Yes In the recent release of Py Torch 1.10, the introduction of the saved_tensors_hooks() feature makes it possible to implement the offloading technique described in this paper.
Experiment Setup Yes We measured running times and memory occupation of several networks from Py Torch torchvision package: resnet, densenet and inception, with a batch size of 16 and images of 500 500 pixels.