MODeL: Memory Optimizations for Deep Learning
Authors: Benoit Steiner, Mostafa Elhoushi, Jacob Kahn, James Hegarty
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally demonstrate that MODe L only takes seconds to allow the training of neural networks using 30% less memory on average. We study empirically the practicality and effectiveness of our solution on a wide variety of DNNs, which achieves average memory savings exceeding 30% in a median time of less than 7 seconds. |
| Researcher Affiliation | Industry | 1Anthropic, San Francisco, USA 2Meta, FAIR, Menlo Park, USA 3Meta, Reality Labs, Seattle, USA. Correspondence to: Benoit Steiner <benoit@anthropic.com>. |
| Pseudocode | Yes | Function 1 Generate Execution Sequence(C) and Function 2 Is In Transitive Fanin(v1, v2, cache) |
| Open Source Code | Yes | MODe L is an open-source project available at https://github.com/facebookresearch/model opt. |
| Open Datasets | Yes | We included the Res Net (He et al., 2016) and Transformer (Vaswani et al., 2017) models... We also included neural networks designed for specific tasks, such as computer vision (Alex Net (Krizhevsky et al., 2012), VGG (Simonyan & Zisserman, 2015), video understanding (Res Net3D (Tran et al., 2018)), and large language models (BERT (Devlin et al., 2018), XLM-R (Conneau et al., 2019)). |
| Dataset Splits | No | The paper describes using various neural networks and batch sizes, but does not provide specific details on training, validation, or test dataset splits. |
| Hardware Specification | Yes | We ran all our experiments on a workstation featuring a Intel Xeon Gold 6138 CPU running at 2.0 GHz and a NVidia A100 GPU. |
| Software Dependencies | Yes | We implemented MODe L on top of Py Torch version 1.11 (Paszke et al., 2019) with torchtext 0.12 and torchvision 0.12... We encoded and solved the memory optimizations problem (equation 9) using Gurobi version 9.1.1 (Gurobi Optimization, LLC, 2022). |
| Experiment Setup | Yes | Additionally, we trained the neural networks at batch size 1 and 32. |