Convolutional Tensor-Train LSTM for Spatio-Temporal Learning

Authors: Jiahao Su, Wonmin Byeon, Jean Kossaifi, Furong Huang, Jan Kautz, Anima Anandkumar

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments Here, we empirically evaluate our approach on several datasets for two different tasks video prediction and early activity recognition and find out it outperforms existing approaches. Evaluation. For video prediction, the model predicts every pixel in the frame. We test our proposed models on the KTH human action dataset [20] with resolution 128 128 and the Moving-MNIST-2 dataset [2] with resolution 64 64.
Researcher Affiliation Collaboration 1University of Maryland, College Park, MD 2NVIDIA Research, Santa Clara, CA
Pseudocode Yes The full procedure can be found in Appendix A (algorithm 2).
Open Source Code Yes Both versions are available online: https://github.com/NVlabs/conv-tt-lstm.
Open Datasets Yes We test our proposed models on the KTH human action dataset [20] with resolution 128 128 and the Moving-MNIST-2 dataset [2] with resolution 64 64. For early activity recognition, we evaluate our approach on the Something-Something V2 dataset. Following [7], we used the subset of 41 categories defined by Goyal et al. [21] (Table 7).
Dataset Splits No The paper states it validates hyper-parameters on a 'validation set' but does not provide specific split percentages or counts for training, validation, and test datasets needed to reproduce the experiment.
Hardware Specification No The paper mentions optimizing for 'GPUs' and 'CPUs' and refers to the 'NVIDIA apex library' and 'CUDA multi-streams', but it does not provide specific model numbers for GPUs or CPUs, or detailed hardware specifications used for running experiments.
Software Dependencies No The paper mentions using 'NVIDIA apex library', 'ADAM optimizer', and 'Torch Script' for efficient implementation, but it does not specify any version numbers for these software components.
Experiment Setup Yes Hyper-parameter selection. We validate the hyper-parameters of our Conv-TT-LSTM on though a wide grid search on the validation set. Specifically, we consider a base filter size S = 3, 5, order of the decomposition N = 1, 2, 3, 5, tensor ranks C(i) = 4, 8, 16, and number of hidden states M = 1, 3, 5. Appendix B contains the details of our hyper-parameter search.