Memory Optimization for Deep Networks
Authors: Aashaka Shah, Chao-Yuan Wu, Jayashree Mohan, Vijay Chidambaram, Philipp Kraehenbuehl
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments, demonstrating that MONET significantly outperforms existing automatic frameworks that use local or global techniques. On multiple architectures (Res Net (He et al., 2016), VGG (Simonyan & Zisserman, 2015), UNet (Ronneberger et al., 2015), Google Net (Szegedy et al., 2015), Mobile Net-V2 (Sandler et al., 2018)), memory budgets (5-10 GB), and network configurations (multiple resolutions), MONET consistently achieves lower memory footprints at equivalent or lower computational overhead. |
| Researcher Affiliation | Collaboration | Aashaka Shah1, Chao-Yuan Wu1, Jayashree Mohan1, Vijay Chidambaram1,2, Philipp Kr ahenb uhl1 1University of Texas at Austin 2VMware Research |
| Pseudocode | Yes | Algorithm 1: Forward Pass and Algorithm 2: Backward Pass |
| Open Source Code | Yes | Our code is available at https://github.com/utsaslab/MONe T. |
| Open Datasets | Yes | On multiple architectures (Res Net (He et al., 2016), VGG (Simonyan & Zisserman, 2015), UNet (Ronneberger et al., 2015), Google Net (Szegedy et al., 2015), Mobile Net-V2 (Sandler et al., 2018)) and We evaluate it on 3DUNet (C ic ek et al., 2016). These are well-known models/architectures typically trained on public datasets. |
| Dataset Splits | No | The paper evaluates performance on various architectures (e.g., ResNet, VGG, UNet) but does not explicitly state the training, validation, and test dataset splits used for these models, or if any specific splits were used beyond implied standard practices for these well-known architectures. |
| Hardware Specification | Yes | All checkpointing schedules are run using the same software implementations and costs are profiled on the same hardware (NVIDIA P100 GPUs). Batch size for the experiments is fixed to be the maximum at which the model can be trained using baseline Py Torch on a 16 GB GPU. |
| Software Dependencies | Yes | We develop MONET in Py Torch v1.5.1 and solve the joint optimization problem using the Gurobi (2014) solver. We solve the joint optimization problem using the CVXPY (Diamond & Boyd, 2016; Agrawal et al., 2018) modeling language and GUROBI (Gurobi, 2014) solver. We also implement SSDC using Nvidia s cu SPARSE library (function cusparse Sdense2csr) with CUDA Toolkit version 10.1 using Py Torch s C++ extensions. |
| Experiment Setup | Yes | The UNet experiments use 608 416 inputs following prior work (Jain et al., 2019). All other experiments use 224 224 inputs following conventions (Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; He et al., 2016). Batch size for the experiments is fixed to be the maximum at which the model can be trained using baseline Py Torch on a 16 GB GPU. For our evaluation, we cap the solver time to 24 hours for both MONET and Checkmate, and run the schedule thus obtained on our execution framework. |