Tensorizing Neural Networks
Authors: Alexander Novikov, Dmitrii Podoprikhin, Anton Osokin, Dmitry P. Vetrov
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our method to popular network architectures proposed for several datasets of different scales: MNIST [15], CIFAR-10 [12], Image Net [13]. We experimentally show that the networks with the TT-layers match the performance of their uncompressed counterparts but require up to 200 000 times less of parameters, decreasing the size of the whole network by a factor of 7. |
| Researcher Affiliation | Academia | Alexander Novikov1,4 Dmitry Podoprikhin1 Anton Osokin2 Dmitry Vetrov1,3 1Skolkovo Institute of Science and Technology, Moscow, Russia 2INRIA, SIERRA project-team, Paris, France 3National Research University Higher School of Economics, Moscow, Russia 4Institute of Numerical Mathematics of the Russian Academy of Sciences, Moscow, Russia |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | In all experiments we use our MATLAB extension3 of the Mat Conv Net framework4 [24]. 3https://github.com/Bihaqo/Tensor Net |
| Open Datasets | Yes | We apply our method to popular network architectures proposed for several datasets of different scales: MNIST [15], CIFAR-10 [12], Image Net [13]. |
| Dataset Splits | Yes | The dataset contains 50000 train and 10000 test images. We consider the 1000-class Image Net ILSVRC-2012 dataset [19], which consist of 1.2 million training images and 50 000 validation images. |
| Hardware Specification | Yes | The experiments were performed on a computer with a quad-core Intel Core i5-4460 CPU, 16 GB RAM and a single NVidia Geforce GTX 980 GPU. |
| Software Dependencies | No | The paper mentions using "MATLAB extension of the Mat Conv Net framework" and "TT-Toolbox implemented in MATLAB" but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We train all the networks with stochastic gradient descent with momentum (coefficient 0.9). We initialize all the parameters of the TTand fully-connected layers with a Gaussian noise and put L2-regularization (weight 0.0005) on them. |