Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Authors: Woosuk Kwon, Gyeong-In Yu, Eunji Jeong, Byung-Gon Chun

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluation on a variety of neural networks shows that compared to Py Torch, Nimble speeds up inference and training by up to 22.34 and 3.61 , respectively.
Researcher Affiliation Academia Woosuk Kwon , Gyeong-In Yu , Eunji Jeong, Byung-Gon Chun Seoul National University {kws9603,gyeongin,ejjeong,bgchun}@snu.ac.kr
Pseudocode Yes Algorithm 1: Nimble s stream assignment algorithm.
Open Source Code Yes Nimble is publicly available at https://github.com/ snuspl/nimble.
Open Datasets Yes We use various neural networks [21, 32, 33, 34, 39], all trained on Image Net [31]. For example, in the field of computer vision, the CIFAR-10 [24] dataset is widely used among researchers and many neural networks are trained on the dataset.
Dataset Splits No The paper mentions using specific datasets (e.g., ImageNet, CIFAR-10) and batch sizes (e.g., 'batch size 1', 'batch size 32') for experiments, but it does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages, sample counts, or specific predefined splits with citations).
Hardware Specification Yes For evaluation, we use an NVIDIA V100 GPU along with 2.10GHz Intel Xeon CPU E5-2695 v4.
Software Dependencies Yes We implement Nimble on Py Torch v1.4 with CUDA 10.2 and cu DNN 8.0.2. For evaluation, we use an NVIDIA V100 GPU along with 2.10GHz Intel Xeon CPU E5-2695 v4. To evaluate DL inference, we compare Nimble with popular DL frameworks, Py Torch, Torch Script and Caffe2, as well as state-of-the-art inference systems, Tensor RT (v7.1) [3] and TVM (v0.6.1) [14].
Experiment Setup Yes Figure 2a shows the ratios of the GPU active time... with batch size 1. All neural networks [18, 21, 32, 34] are trained with batch size 32. We implement Nimble on Py Torch v1.4 with CUDA 10.2 and cu DNN 8.0.2.