Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
Authors: Woosuk Kwon, Gyeong-In Yu, Eunji Jeong, Byung-Gon Chun
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluation on a variety of neural networks shows that compared to Py Torch, Nimble speeds up inference and training by up to 22.34 and 3.61 , respectively. |
| Researcher Affiliation | Academia | Woosuk Kwon , Gyeong-In Yu , Eunji Jeong, Byung-Gon Chun Seoul National University {kws9603,gyeongin,ejjeong,bgchun}@snu.ac.kr |
| Pseudocode | Yes | Algorithm 1: Nimble s stream assignment algorithm. |
| Open Source Code | Yes | Nimble is publicly available at https://github.com/ snuspl/nimble. |
| Open Datasets | Yes | We use various neural networks [21, 32, 33, 34, 39], all trained on Image Net [31]. For example, in the field of computer vision, the CIFAR-10 [24] dataset is widely used among researchers and many neural networks are trained on the dataset. |
| Dataset Splits | No | The paper mentions using specific datasets (e.g., ImageNet, CIFAR-10) and batch sizes (e.g., 'batch size 1', 'batch size 32') for experiments, but it does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages, sample counts, or specific predefined splits with citations). |
| Hardware Specification | Yes | For evaluation, we use an NVIDIA V100 GPU along with 2.10GHz Intel Xeon CPU E5-2695 v4. |
| Software Dependencies | Yes | We implement Nimble on Py Torch v1.4 with CUDA 10.2 and cu DNN 8.0.2. For evaluation, we use an NVIDIA V100 GPU along with 2.10GHz Intel Xeon CPU E5-2695 v4. To evaluate DL inference, we compare Nimble with popular DL frameworks, Py Torch, Torch Script and Caffe2, as well as state-of-the-art inference systems, Tensor RT (v7.1) [3] and TVM (v0.6.1) [14]. |
| Experiment Setup | Yes | Figure 2a shows the ratios of the GPU active time... with batch size 1. All neural networks [18, 21, 32, 34] are trained with batch size 32. We implement Nimble on Py Torch v1.4 with CUDA 10.2 and cu DNN 8.0.2. |