reproducibilityindex.ai

MODeL: Memory Optimizations for Deep Learning

Authors: Benoit Steiner, Mostafa Elhoushi, Jacob Kahn, James Hegarty

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally demonstrate that MODe L only takes seconds to allow the training of neural networks using 30% less memory on average. We study empirically the practicality and effectiveness of our solution on a wide variety of DNNs, which achieves average memory savings exceeding 30% in a median time of less than 7 seconds.
Researcher Affiliation	Industry	1Anthropic, San Francisco, USA 2Meta, FAIR, Menlo Park, USA 3Meta, Reality Labs, Seattle, USA. Correspondence to: Benoit Steiner <benoit@anthropic.com>.
Pseudocode	Yes	Function 1 Generate Execution Sequence(C) and Function 2 Is In Transitive Fanin(v1, v2, cache)
Open Source Code	Yes	MODe L is an open-source project available at https://github.com/facebookresearch/model opt.
Open Datasets	Yes	We included the Res Net (He et al., 2016) and Transformer (Vaswani et al., 2017) models... We also included neural networks designed for specific tasks, such as computer vision (Alex Net (Krizhevsky et al., 2012), VGG (Simonyan & Zisserman, 2015), video understanding (Res Net3D (Tran et al., 2018)), and large language models (BERT (Devlin et al., 2018), XLM-R (Conneau et al., 2019)).
Dataset Splits	No	The paper describes using various neural networks and batch sizes, but does not provide specific details on training, validation, or test dataset splits.
Hardware Specification	Yes	We ran all our experiments on a workstation featuring a Intel Xeon Gold 6138 CPU running at 2.0 GHz and a NVidia A100 GPU.
Software Dependencies	Yes	We implemented MODe L on top of Py Torch version 1.11 (Paszke et al., 2019) with torchtext 0.12 and torchvision 0.12... We encoded and solved the memory optimizations problem (equation 9) using Gurobi version 9.1.1 (Gurobi Optimization, LLC, 2022).
Experiment Setup	Yes	Additionally, we trained the neural networks at batch size 1 and 32.