Tensorizing Neural Networks

Authors: Alexander Novikov, Dmitrii Podoprikhin, Anton Osokin, Dmitry P. Vetrov

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply our method to popular network architectures proposed for several datasets of different scales: MNIST [15], CIFAR-10 [12], Image Net [13]. We experimentally show that the networks with the TT-layers match the performance of their uncompressed counterparts but require up to 200 000 times less of parameters, decreasing the size of the whole network by a factor of 7.
Researcher Affiliation Academia Alexander Novikov1,4 Dmitry Podoprikhin1 Anton Osokin2 Dmitry Vetrov1,3 1Skolkovo Institute of Science and Technology, Moscow, Russia 2INRIA, SIERRA project-team, Paris, France 3National Research University Higher School of Economics, Moscow, Russia 4Institute of Numerical Mathematics of the Russian Academy of Sciences, Moscow, Russia
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes In all experiments we use our MATLAB extension3 of the Mat Conv Net framework4 [24]. 3https://github.com/Bihaqo/Tensor Net
Open Datasets Yes We apply our method to popular network architectures proposed for several datasets of different scales: MNIST [15], CIFAR-10 [12], Image Net [13].
Dataset Splits Yes The dataset contains 50000 train and 10000 test images. We consider the 1000-class Image Net ILSVRC-2012 dataset [19], which consist of 1.2 million training images and 50 000 validation images.
Hardware Specification Yes The experiments were performed on a computer with a quad-core Intel Core i5-4460 CPU, 16 GB RAM and a single NVidia Geforce GTX 980 GPU.
Software Dependencies No The paper mentions using "MATLAB extension of the Mat Conv Net framework" and "TT-Toolbox implemented in MATLAB" but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We train all the networks with stochastic gradient descent with momentum (coefficient 0.9). We initialize all the parameters of the TTand fully-connected layers with a Gaussian noise and put L2-regularization (weight 0.0005) on them.