MODeL: Memory Optimizations for Deep Learning

Authors: Benoit Steiner, Mostafa Elhoushi, Jacob Kahn, James Hegarty

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally demonstrate that MODe L only takes seconds to allow the training of neural networks using 30% less memory on average. We study empirically the practicality and effectiveness of our solution on a wide variety of DNNs, which achieves average memory savings exceeding 30% in a median time of less than 7 seconds.
Researcher Affiliation Industry 1Anthropic, San Francisco, USA 2Meta, FAIR, Menlo Park, USA 3Meta, Reality Labs, Seattle, USA. Correspondence to: Benoit Steiner <benoit@anthropic.com>.
Pseudocode Yes Function 1 Generate Execution Sequence(C) and Function 2 Is In Transitive Fanin(v1, v2, cache)
Open Source Code Yes MODe L is an open-source project available at https://github.com/facebookresearch/model opt.
Open Datasets Yes We included the Res Net (He et al., 2016) and Transformer (Vaswani et al., 2017) models... We also included neural networks designed for specific tasks, such as computer vision (Alex Net (Krizhevsky et al., 2012), VGG (Simonyan & Zisserman, 2015), video understanding (Res Net3D (Tran et al., 2018)), and large language models (BERT (Devlin et al., 2018), XLM-R (Conneau et al., 2019)).
Dataset Splits No The paper describes using various neural networks and batch sizes, but does not provide specific details on training, validation, or test dataset splits.
Hardware Specification Yes We ran all our experiments on a workstation featuring a Intel Xeon Gold 6138 CPU running at 2.0 GHz and a NVidia A100 GPU.
Software Dependencies Yes We implemented MODe L on top of Py Torch version 1.11 (Paszke et al., 2019) with torchtext 0.12 and torchvision 0.12... We encoded and solved the memory optimizations problem (equation 9) using Gurobi version 9.1.1 (Gurobi Optimization, LLC, 2022).
Experiment Setup Yes Additionally, we trained the neural networks at batch size 1 and 32.