reproducibilityindex.ai

Memory Optimization for Deep Networks

Authors: Aashaka Shah, Chao-Yuan Wu, Jayashree Mohan, Vijay Chidambaram, Philipp Kraehenbuehl

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments, demonstrating that MONET signiﬁcantly outperforms existing automatic frameworks that use local or global techniques. On multiple architectures (Res Net (He et al., 2016), VGG (Simonyan & Zisserman, 2015), UNet (Ronneberger et al., 2015), Google Net (Szegedy et al., 2015), Mobile Net-V2 (Sandler et al., 2018)), memory budgets (5-10 GB), and network conﬁgurations (multiple resolutions), MONET consistently achieves lower memory footprints at equivalent or lower computational overhead.
Researcher Affiliation	Collaboration	Aashaka Shah1, Chao-Yuan Wu1, Jayashree Mohan1, Vijay Chidambaram1,2, Philipp Kr ahenb uhl1 1University of Texas at Austin 2VMware Research
Pseudocode	Yes	Algorithm 1: Forward Pass and Algorithm 2: Backward Pass
Open Source Code	Yes	Our code is available at https://github.com/utsaslab/MONe T.
Open Datasets	Yes	On multiple architectures (Res Net (He et al., 2016), VGG (Simonyan & Zisserman, 2015), UNet (Ronneberger et al., 2015), Google Net (Szegedy et al., 2015), Mobile Net-V2 (Sandler et al., 2018)) and We evaluate it on 3DUNet (C ic ek et al., 2016). These are well-known models/architectures typically trained on public datasets.
Dataset Splits	No	The paper evaluates performance on various architectures (e.g., ResNet, VGG, UNet) but does not explicitly state the training, validation, and test dataset splits used for these models, or if any specific splits were used beyond implied standard practices for these well-known architectures.
Hardware Specification	Yes	All checkpointing schedules are run using the same software implementations and costs are proﬁled on the same hardware (NVIDIA P100 GPUs). Batch size for the experiments is ﬁxed to be the maximum at which the model can be trained using baseline Py Torch on a 16 GB GPU.
Software Dependencies	Yes	We develop MONET in Py Torch v1.5.1 and solve the joint optimization problem using the Gurobi (2014) solver. We solve the joint optimization problem using the CVXPY (Diamond & Boyd, 2016; Agrawal et al., 2018) modeling language and GUROBI (Gurobi, 2014) solver. We also implement SSDC using Nvidia s cu SPARSE library (function cusparse Sdense2csr) with CUDA Toolkit version 10.1 using Py Torch s C++ extensions.
Experiment Setup	Yes	The UNet experiments use 608 416 inputs following prior work (Jain et al., 2019). All other experiments use 224 224 inputs following conventions (Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; He et al., 2016). Batch size for the experiments is ﬁxed to be the maximum at which the model can be trained using baseline Py Torch on a 16 GB GPU. For our evaluation, we cap the solver time to 24 hours for both MONET and Checkmate, and run the schedule thus obtained on our execution framework.