Expressive power of recurrent neural networks
Authors: Valentin Khrulkov, Alexander Novikov, Ivan Oseledets
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we prove the expressive power theorem (an exponential lower bound on the width of the equivalent shallow network) for a class of recurrent neural networks ones that correspond to the Tensor Train (TT) decomposition. This means that even processing an image patch by patch with an RNN can be exponentially more efficient than a (shallow) convolutional network with one hidden layer. Using theoretical results on the relation between the tensor decompositions we compare expressive powers of the HTand TT-Networks. We also implement the recurrent TT-Networks and provide numerical evidence of their expressivity. |
| Researcher Affiliation | Academia | Valentin Khrulkov Skolkovo Institute of Science and Technology valentin.khrulkov@skolkovotech.ru Alexander Novikov National Research University Higher School of Economics Institute of Numerical Mathematics RAS novikov@bayesgroup.ru Ivan Oseledets Skolkovo Institute of Science and Technology Institute of Numerical Mathematics RAS i.oseledets@skoltech.ru |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statement about releasing source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | For the next experiments, we use computer vision datasets MNIST (Le Cun et al. (1990)) and CIFAR10 (Krizhevsky & Hinton (2009)). |
| Dataset Splits | No | The paper mentions 'batch size 32' and discusses train/test accuracy but does not provide specific details on how the datasets were split into training, validation, and test sets (e.g., percentages or sample counts). |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments are provided in the paper. |
| Software Dependencies | No | To train the TTand CP-Networks, we implemented them in Tensor Flow (Abadi et al. (2015)). No specific version number for TensorFlow or other software dependencies is mentioned. |
| Experiment Setup | Yes | To train the TTand CP-Networks, we implemented them in Tensor Flow (Abadi et al. (2015)) and used Adam optimizer with batch size 32 and learning rate sweeping across {4e-3, 2e-3, 1e-3, 5e-4} values. |