Expressive power of recurrent neural networks

Authors: Valentin Khrulkov, Alexander Novikov, Ivan Oseledets

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we prove the expressive power theorem (an exponential lower bound on the width of the equivalent shallow network) for a class of recurrent neural networks ones that correspond to the Tensor Train (TT) decomposition. This means that even processing an image patch by patch with an RNN can be exponentially more efficient than a (shallow) convolutional network with one hidden layer. Using theoretical results on the relation between the tensor decompositions we compare expressive powers of the HTand TT-Networks. We also implement the recurrent TT-Networks and provide numerical evidence of their expressivity.
Researcher Affiliation Academia Valentin Khrulkov Skolkovo Institute of Science and Technology valentin.khrulkov@skolkovotech.ru Alexander Novikov National Research University Higher School of Economics Institute of Numerical Mathematics RAS novikov@bayesgroup.ru Ivan Oseledets Skolkovo Institute of Science and Technology Institute of Numerical Mathematics RAS i.oseledets@skoltech.ru
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets Yes For the next experiments, we use computer vision datasets MNIST (Le Cun et al. (1990)) and CIFAR10 (Krizhevsky & Hinton (2009)).
Dataset Splits No The paper mentions 'batch size 32' and discusses train/test accuracy but does not provide specific details on how the datasets were split into training, validation, and test sets (e.g., percentages or sample counts).
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments are provided in the paper.
Software Dependencies No To train the TTand CP-Networks, we implemented them in Tensor Flow (Abadi et al. (2015)). No specific version number for TensorFlow or other software dependencies is mentioned.
Experiment Setup Yes To train the TTand CP-Networks, we implemented them in Tensor Flow (Abadi et al. (2015)) and used Adam optimizer with batch size 32 and learning rate sweeping across {4e-3, 2e-3, 1e-3, 5e-4} values.