Scalable Distributed DL Training: Batching Communication and Computation
Authors: Shaoqi Wang, Aidi Pi, Xiaobo Zhou5289-5296
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement i Batch in the open-source DL framework Big DL and perform evaluations with various DL workloads. Experimental results show that i Batch improves the scalability of a cluster of 72 nodes by up to 73% over the default PS and 41% over the layer by layer strategy. |
| Researcher Affiliation | Academia | Shaoqi Wang, Aidi Pi, Xiaobo Zhou Department of Computer Science University of Colorado, Colorado Springs, CO, USA {swang, epi, xzhou}@uccs.edu |
| Pseudocode | Yes | Algorithm 1 Greedy algorithm that generates li from l1 to l N 1 |
| Open Source Code | No | We have implemented i Batch in Big DL (version 0.5.0) by modifying source files in package com.intel.analytics.bigdl. The paper states BigDL is open-source but does not explicitly state that the iBatch implementation itself is released or provide a link to the modified source files. |
| Open Datasets | Yes | We use two well-known image classification datasets. (1) Image Net22K, the largest public dataset for image classification, including 14.2 million labeled images from 21841 categories. (2) ILSVRC12, a subset of Image Net22K that has 1.28 million of training images; |
| Dataset Splits | No | The paper mentions using datasets for training but does not provide specific details on how the data was split into training, validation, and test sets (e.g., percentages or counts). |
| Hardware Specification | Yes | We conduct our experiments on a CPU cluster in a private cloud. The cloud runs on 8 HP BL460c G6 blade servers interconnected with 10Gbps global Ethernet. |
| Software Dependencies | Yes | We have implemented i Batch in Big DL (version 0.5.0) by modifying source files in package com.intel.analytics.bigdl. |
| Experiment Setup | Yes | The goal of i Batch is to minimize the execution time including the total parameter communication time and the forward computation time. We first formulate the batching decision as an optimization problem of execution time minimization based on the profile of the parameter communication time and the forward computation time. Then, we use greedy algorithm that maximizes the overlap to solve the problem and derive communication and computation batches. |