Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

Authors: Emily L Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, Rob Fergus

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using large state-of-the-art models, we demonstrate speedups of convolutional layers on both CPU and GPU by a factor of 2, while keeping the accuracy within 1% of the original model. We present results showing the performance of the approximations described in Section 3 in terms of prediction accuracy, speedup gains and reduction in memory overhead.
Researcher Affiliation Academia Emily Denton, Wojciech Zaremba, Joan Bruna, Yann Le Cun and Rob Fergus Dept. of Computer Science, Courant Institute, New York University {denton, zaremba, bruna, lecun, fergus} @cs.nyu.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper references Alex Krizhevsky’s CUDA convolution routines as a baseline (https://code.google.com/p/cuda-convnet/), but there is no explicit statement or link indicating that the authors' own code for the described methodology is open-source or publicly available.
Open Datasets Yes We use the 15 layer convolutional architecture of [8], trained on the Image Net 2012 dataset [9].
Dataset Splits Yes All measurements of prediction performance are with respect to the 50K validation images from the Image Net12 dataset.
Hardware Specification Yes All GPU code was run on a standard n Vidia Titan card.
Software Dependencies No The paper mentions software like C++ with Eigen3 library and Intel MKL, and Alex Krizhevsky's CUDA convolution routines, but it does not specify version numbers for these components.
Experiment Setup Yes All of our fine-tuning results were achieved by training with less than 2 passes using the Image Net12 training dataset. Using the monochromatic approximation with 6 colors for the first layer and the biclustering with outer product decomposition approx-imation for the second layer (G = 48; H = 2; K = 8)