Tensor-Train Recurrent Neural Networks for Video Classification

Authors: Yinchong Yang, Denis Krompass, Volker Tresp

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our model on classification tasks using multiple real-world video datasets and achieve competitive performances with state-of-the-art models, even though our model architecture is orders of magnitude less complex.In the following, we present our experiments conducted on three large video datasets. These empirical results demonstrate that the integration of the Tensor-Train Layer in plain RNN architectures such as a tensorized LSTM or GRU boosts the classification quality of these models tremendously when directly exposed to high-dimensional input data, such as video data.
Researcher Affiliation Collaboration 1Ludwig Maximilian University of Munich, Germany 2Siemens AG, Corporate Technology, Germany.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The source codes of our TT-RNN implementations and all the experiments in Sec. 4 are publicly available at https://github.com/Tuyki/TT_RNN.
Open Datasets Yes UCF11 Data (Liu et al., 2009), Hollywood2 Data (Marszałek et al., 2009), Youtube Celebrities Face Data (Kim et al., 2008).
Dataset Splits Yes We follow (Liu et al., 2013) and perform for each experimental setting a 5-fold cross validation with mutual exclusive data splits.
Hardware Specification Yes The models were trained on a Quad core Intel R Xeon R E74850 v2 2.30GHz Processor to a maximum of 100 epochs.The models were trained on an NVIDIA Tesla K40c Processor to a maximum of 500 epochs.
Software Dependencies No The paper mentions 'Theano (Bastien et al., 2012) and deployed in Keras (Chollet, 2015)' but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We applied 0.25 Dropout (Srivastava et al., 2014) for both input-to-hidden and hidden-to-hidden mappings in plain GRU and LSTM as well as their respective TT modifications; and 0.01 ridge regularization for the single-layered classifier. We used the Adam (Kingma & Ba, 2014) step rule for the updates with an initial learning rate 0.001.