Efficient Rematerialization for Deep Networks
Authors: Ravi Kumar, Manish Purohit, Zoya Svitkina, Erik Vee, Joshua Wang
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate the performance of these algorithms on many common deep learning models. ... We experimentally evaluate the performance of our rematerialization algorithm on computational graphs for training commonly used deep neural networks. |
| Researcher Affiliation | Industry | Ravi Kumar Google Research Mountain View, CA 94043 ravi.k53@gmail.com Manish Purohit Google Research Mountain View, CA 94043 mpurohit@google.com Zoya Svitkina Google Research Mountain View, CA 94043 zoya@google.com Erik Vee Google Research Mountain View, CA 94043 erikvee@google.com Joshua R. Wang Google Research Mountain View, CA 94043 joshuawang@google.com |
| Pseudocode | Yes | Algorithm 1: Efficient Rematerialization via Tree Decomposition. |
| Open Source Code | No | The paper refers to third-party open-source implementations (e.g., official ResNet and Transformer models in TensorFlow) that they used for evaluation, but does not provide specific links or statements about releasing their own implementation code for the rematerialization algorithm. |
| Open Datasets | Yes | We use the official implementation of the Res Net model for the Image Net task in Tensor Flow. ... (i) Deep Residual Networks (Res Net): We first consider deep residual networks (Res Net) [13] as an example of convolutional networks for image classification. |
| Dataset Splits | No | The paper mentions using models like ResNet, FFN, and Transformer, and conducting experiments, but it does not specify explicit training, validation, or test dataset split percentages or counts. |
| Hardware Specification | No | The paper mentions general hardware such as 'GPUs and AI accelerators' and 'GPU and CPU memory' but does not specify any exact models (e.g., NVIDIA V100, Intel Xeon), quantities, or detailed system specifications used for their experiments. |
| Software Dependencies | No | The paper mentions using 'TensorFlow' for model implementations and 'XLA' as a baseline, but it does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We use different configurations to measure the effect of network depth (number of convolutional layers) on memory requirements of schedules obtained by the algorithms. ... For this experiment, we setup a simple feed-forward network with Re LU activations (number of hidden layers is varied) and randomly generated inputs and outputs. We use mean squared error loss and train using standard gradient descent. ... Again, we use the official implementation of Transformer in Tensor Flow with all hyperparameters set to recommended defaults. |