reproducibilityindex.ai

Scalable Distributed DL Training: Batching Communication and Computation

Authors: Shaoqi Wang, Aidi Pi, Xiaobo Zhou5289-5296

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implement i Batch in the open-source DL framework Big DL and perform evaluations with various DL workloads. Experimental results show that i Batch improves the scalability of a cluster of 72 nodes by up to 73% over the default PS and 41% over the layer by layer strategy.
Researcher Affiliation	Academia	Shaoqi Wang, Aidi Pi, Xiaobo Zhou Department of Computer Science University of Colorado, Colorado Springs, CO, USA {swang, epi, xzhou}@uccs.edu
Pseudocode	Yes	Algorithm 1 Greedy algorithm that generates li from l1 to l N 1
Open Source Code	No	We have implemented i Batch in Big DL (version 0.5.0) by modifying source ﬁles in package com.intel.analytics.bigdl. The paper states BigDL is open-source but does not explicitly state that the iBatch implementation itself is released or provide a link to the modified source files.
Open Datasets	Yes	We use two well-known image classiﬁcation datasets. (1) Image Net22K, the largest public dataset for image classiﬁcation, including 14.2 million labeled images from 21841 categories. (2) ILSVRC12, a subset of Image Net22K that has 1.28 million of training images;
Dataset Splits	No	The paper mentions using datasets for training but does not provide specific details on how the data was split into training, validation, and test sets (e.g., percentages or counts).
Hardware Specification	Yes	We conduct our experiments on a CPU cluster in a private cloud. The cloud runs on 8 HP BL460c G6 blade servers interconnected with 10Gbps global Ethernet.
Software Dependencies	Yes	We have implemented i Batch in Big DL (version 0.5.0) by modifying source ﬁles in package com.intel.analytics.bigdl.
Experiment Setup	Yes	The goal of i Batch is to minimize the execution time including the total parameter communication time and the forward computation time. We ﬁrst formulate the batching decision as an optimization problem of execution time minimization based on the proﬁle of the parameter communication time and the forward computation time. Then, we use greedy algorithm that maximizes the overlap to solve the problem and derive communication and computation batches.