Exploring Hidden Dimensions in Accelerating Convolutional Neural Networks
Authors: Zhihao Jia, Sina Lin, Charles R. Qi, Alex Aiken
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluation shows that layer-wise parallelism outperforms state-of-the-art approaches by increasing training throughput, reducing communication costs, achieving better scalability to multiple GPUs, while maintaining original network accuracy. |
| Researcher Affiliation | Collaboration | 1Stanford University 2Microsoft. Correspondence to: Zhihao Jia <zhihao@cs.stanford.edu>. |
| Pseudocode | Yes | Algorithm 1 shows pseudocode using node and edge eliminations as subroutines to find an optimal parallelization strategy under our cost model. |
| Open Source Code | No | The paper states 'we implemented our framework in Legion...' but does not explicitly provide a concrete access link or statement about releasing the source code for their implementation. |
| Open Datasets | Yes | We evaluate the runtime performance of all three CNNs on the Image Net-1K dataset (Deng et al., 2009) that consists of 1.2 million images from 1,000 categories. |
| Dataset Splits | No | No specific details regarding training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit mention of standard splits for the used datasets) were found. |
| Hardware Specification | Yes | All experiments were performed on a GPU cluster with 4 compute nodes, each of which is equipped with two Intel 10-core E5-2600 CPUs, 256G main memory, and four NVIDIA Tesla P100 GPUs. |
| Software Dependencies | Yes | We ran data parallelism experiments in Tensor Flow r1.7, Py Torch v0.3, and our implementation and compared the runtime performance. |
| Experiment Setup | Yes | We use synchronous training and a per-GPU batch size of 32 for all experiments. |