Exploring Hidden Dimensions in Accelerating Convolutional Neural Networks

Authors: Zhihao Jia, Sina Lin, Charles R. Qi, Alex Aiken

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation shows that layer-wise parallelism outperforms state-of-the-art approaches by increasing training throughput, reducing communication costs, achieving better scalability to multiple GPUs, while maintaining original network accuracy.
Researcher Affiliation Collaboration 1Stanford University 2Microsoft. Correspondence to: Zhihao Jia <zhihao@cs.stanford.edu>.
Pseudocode Yes Algorithm 1 shows pseudocode using node and edge eliminations as subroutines to find an optimal parallelization strategy under our cost model.
Open Source Code No The paper states 'we implemented our framework in Legion...' but does not explicitly provide a concrete access link or statement about releasing the source code for their implementation.
Open Datasets Yes We evaluate the runtime performance of all three CNNs on the Image Net-1K dataset (Deng et al., 2009) that consists of 1.2 million images from 1,000 categories.
Dataset Splits No No specific details regarding training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit mention of standard splits for the used datasets) were found.
Hardware Specification Yes All experiments were performed on a GPU cluster with 4 compute nodes, each of which is equipped with two Intel 10-core E5-2600 CPUs, 256G main memory, and four NVIDIA Tesla P100 GPUs.
Software Dependencies Yes We ran data parallelism experiments in Tensor Flow r1.7, Py Torch v0.3, and our implementation and compared the runtime performance.
Experiment Setup Yes We use synchronous training and a per-GPU batch size of 32 for all experiments.