Coop: Memory is not a Commodity

Authors: Jianhao Zhang, Shihan Ma, Peihong Liu, Jinhui Yuan

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated Coop on eight representative DNNs. The experimental results demonstrate that Coop achieves up to 2 memory saving and hugely reduces compute overhead, search latency, and memory fragmentation compared to the state-of-the-art baselines.
Researcher Affiliation Industry Jianhao Zhang One Flow Research daquexian566@gmail.com Shihan Ma One Flow Research mmasss1205@gmail.com Peihong Liu One Flow Research peihong.l@outlook.com Jinhui Yuan One Flow Research yuanjinhui@oneflow.org
Pseudocode Yes Algorithm 1 The algorithm of Coop
Open Source Code No Coop has been implemented in One Flow framework and it can be easily integrated into any other deep learning framework. (One Flow is an open-source framework, but this statement does not explicitly say *their* implementation of Coop is open-sourced, nor does it provide a direct link for their specific implementation.)
Open Datasets No The paper refers to common DNN models (e.g., GPT-3 style, Swin-Transformer, ResNet-50) which implies the use of public datasets commonly associated with these models (e.g., ImageNet for ResNet-50). However, it does not explicitly state the dataset used or provide a link/citation for it.
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits. It mentions batch sizes for models (e.g., 'BERT Large (16)') but no information on how the data was split for training, validation, or testing.
Hardware Specification Yes All experiments were conducted on a machine equipped with 4 NVIDIA A100 GPU (80 GB, CUDA 11.7, Cu DNN 8.5.0) and 56 Intel(R) Xeon(R) Platinum 8336C CPU cores running Ubuntu 20.04.
Software Dependencies Yes All experiments were conducted on a machine equipped with 4 NVIDIA A100 GPU (80 GB, CUDA 11.7, Cu DNN 8.5.0) and 56 Intel(R) Xeon(R) Platinum 8336C CPU cores running Ubuntu 20.04. For a fair comparison, we re-implemented all baselines in One Flow [39], which is an open-source deep learning framework with Py Torch-aligned APIs.
Experiment Setup Yes For BERT Large and GPT-3 style 2.7B, Adam optimizer was used, while the SGD optimizer was used for the other experiments. Ze RO stage 2 [40] is used when training the GPT-3 style 2.7B model. Among the eight DNNs, Bi LSTM and SPOS have dynamic network structures that vary based on the input.