Coop: Memory is not a Commodity
Authors: Jianhao Zhang, Shihan Ma, Peihong Liu, Jinhui Yuan
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated Coop on eight representative DNNs. The experimental results demonstrate that Coop achieves up to 2 memory saving and hugely reduces compute overhead, search latency, and memory fragmentation compared to the state-of-the-art baselines. |
| Researcher Affiliation | Industry | Jianhao Zhang One Flow Research daquexian566@gmail.com Shihan Ma One Flow Research mmasss1205@gmail.com Peihong Liu One Flow Research peihong.l@outlook.com Jinhui Yuan One Flow Research yuanjinhui@oneflow.org |
| Pseudocode | Yes | Algorithm 1 The algorithm of Coop |
| Open Source Code | No | Coop has been implemented in One Flow framework and it can be easily integrated into any other deep learning framework. (One Flow is an open-source framework, but this statement does not explicitly say *their* implementation of Coop is open-sourced, nor does it provide a direct link for their specific implementation.) |
| Open Datasets | No | The paper refers to common DNN models (e.g., GPT-3 style, Swin-Transformer, ResNet-50) which implies the use of public datasets commonly associated with these models (e.g., ImageNet for ResNet-50). However, it does not explicitly state the dataset used or provide a link/citation for it. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It mentions batch sizes for models (e.g., 'BERT Large (16)') but no information on how the data was split for training, validation, or testing. |
| Hardware Specification | Yes | All experiments were conducted on a machine equipped with 4 NVIDIA A100 GPU (80 GB, CUDA 11.7, Cu DNN 8.5.0) and 56 Intel(R) Xeon(R) Platinum 8336C CPU cores running Ubuntu 20.04. |
| Software Dependencies | Yes | All experiments were conducted on a machine equipped with 4 NVIDIA A100 GPU (80 GB, CUDA 11.7, Cu DNN 8.5.0) and 56 Intel(R) Xeon(R) Platinum 8336C CPU cores running Ubuntu 20.04. For a fair comparison, we re-implemented all baselines in One Flow [39], which is an open-source deep learning framework with Py Torch-aligned APIs. |
| Experiment Setup | Yes | For BERT Large and GPT-3 style 2.7B, Adam optimizer was used, while the SGD optimizer was used for the other experiments. Ze RO stage 2 [40] is used when training the GPT-3 style 2.7B model. Among the eight DNNs, Bi LSTM and SPOS have dynamic network structures that vary based on the input. |