reproducibilityindex.ai

Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Authors: Woosuk Kwon, Gyeong-In Yu, Eunji Jeong, Byung-Gon Chun

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluation on a variety of neural networks shows that compared to Py Torch, Nimble speeds up inference and training by up to 22.34 and 3.61 , respectively.
Researcher Affiliation	Academia	Woosuk Kwon , Gyeong-In Yu , Eunji Jeong, Byung-Gon Chun Seoul National University {kws9603,gyeongin,ejjeong,bgchun}@snu.ac.kr
Pseudocode	Yes	Algorithm 1: Nimble s stream assignment algorithm.
Open Source Code	Yes	Nimble is publicly available at https://github.com/ snuspl/nimble.
Open Datasets	Yes	We use various neural networks [21, 32, 33, 34, 39], all trained on Image Net [31]. For example, in the ﬁeld of computer vision, the CIFAR-10 [24] dataset is widely used among researchers and many neural networks are trained on the dataset.
Dataset Splits	No	The paper mentions using specific datasets (e.g., ImageNet, CIFAR-10) and batch sizes (e.g., 'batch size 1', 'batch size 32') for experiments, but it does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages, sample counts, or specific predefined splits with citations).
Hardware Specification	Yes	For evaluation, we use an NVIDIA V100 GPU along with 2.10GHz Intel Xeon CPU E5-2695 v4.
Software Dependencies	Yes	We implement Nimble on Py Torch v1.4 with CUDA 10.2 and cu DNN 8.0.2. For evaluation, we use an NVIDIA V100 GPU along with 2.10GHz Intel Xeon CPU E5-2695 v4. To evaluate DL inference, we compare Nimble with popular DL frameworks, Py Torch, Torch Script and Caffe2, as well as state-of-the-art inference systems, Tensor RT (v7.1) [3] and TVM (v0.6.1) [14].
Experiment Setup	Yes	Figure 2a shows the ratios of the GPU active time... with batch size 1. All neural networks [18, 21, 32, 34] are trained with batch size 32. We implement Nimble on Py Torch v1.4 with CUDA 10.2 and cu DNN 8.0.2.