reproducibilityindex.ai

Deep Tensor Convolution on Multicores

Authors: David Budden, Alexander Matveev, Shibani Santurkar, Shraman Ray Chaudhuri, Nir Shavit

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We benchmark 2D Conv Net performance against two popular frameworks: Tensor Flow, using the newer Eigen 3.3 library (with AVX support); and Caffe, compiled to use Intel s optimized MKL library. We consider the propagation time of a 224 224 Image Net image through three convolution layers to capture any necessary inter-layer reshufﬂing.
Researcher Affiliation	Academia	1Massachusetts Institute of Technology. Correspondence to: David Budden <budden@csail.mit.edu>.
Pseudocode	Yes	Algorithm 1 Fast Vector Convolution
Open Source Code	No	The paper does not state that its own source code for the methodology described is available. It cites third-party open-source projects like "Wincnn. https://github.com/ andravin/wincnn, 2016." and "Nnpack. https://github.com/ Maratyszcza/NNPACK, 2016.", but not its own.
Open Datasets	Yes	We benchmark 2D Conv Net performance against two popular frameworks: Tensor Flow, using the newer Eigen 3.3 library (with AVX support); and Caffe, compiled to use Intel s optimized MKL library. We consider the propagation time of a 224 224 Image Net image through three convolution layers to capture any necessary inter-layer reshufﬂing.
Dataset Splits	No	The paper describes using a 224x224 ImageNet image for benchmarking convolution throughput, but it does not specify any training, validation, or test dataset splits for model training or evaluation in the traditional sense, as its experiments focus on operation performance rather than model accuracy.
Hardware Specification	Yes	We benchmarked the performance of our fast convolution algorithm on a 1.44 TFLOP/s Xeon E7-8890 CPU and observe that it executes at 70% maximum utilization. This includes all steps from input to output, including all necessary data reshufﬂing. As a point of comparison, Intel s own MKL convolutional primitive runs at just 20% (excluding reshufﬂing) on the same processor.
Software Dependencies	Yes	Tensor Flow, using the newer Eigen 3.3 library (with AVX support); and Caffe, compiled to use Intel s optimized MKL library. We adopt the Cilk Plus work-stealing scheduler supported by GCC 4.8 (Blumofe et al., 1996; Robison, 2013)
Experiment Setup	Yes	We consider the propagation time of a 224 224 Image Net image through three convolution layers to capture any necessary inter-layer reshufﬂing. We choose this simple architecture over a named network because we are not interested in comparing execution times of pooling, fully-connected or other layers. We also select an obscure kernel size (4 4) for which there have been no Winograd-style fast algorithms published, in order to demonstrate the generality of our implementation to arbitrary kernels. Each layer contains a modest 32 channels and 32 kernels for spreading the cost associated with applying transform matrices. Results presented are the fastest across batch sizes of 1, 20 and 200.