Recurrent Mixture Density Network for Spatiotemporal Visual Attention
Authors: Loris Bazzani, Hugo Larochelle, Lorenzo Torresani
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on Hollywood2 show state-of-the-art performance on saliency prediction for video. We also show that our attentional model trained on Hollywood2 generalizes well to UCF101 and it can be leveraged to improve action classification accuracy on both datasets. |
| Researcher Affiliation | Collaboration | Loris Bazzani Amazon Berlin, Germany bazzanil@amazon.com Hugo Larochelle D epartement d informatique Universit e de Sherbrooke hugo.larochelle@usherbrooke.ca Lorenzo Torresani Department of Computer Science Dartmouth College lt@dartmouth.edu |
| Pseudocode | No | The paper provides equations and diagrams of the model architecture but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-sourcing of the code for their described methodology. |
| Open Datasets | Yes | Therefore, we used the Hollywood2 dataset, which was augmented with eye tracking data by Mathe & Sminchisescu (2015). ... and UCF101 (Soomro et al., 2012). |
| Dataset Splits | Yes | We use a validation set consisting of 20% of the training set. We use the remaining 80% of the training data to learn our models, and use the hold-out validation set to choose the hyperpameters of our model. |
| Hardware Specification | Yes | All the experiments were carried out using an NVIDIA Tesla K40 card. |
| Software Dependencies | No | The paper mentions using a 'pretrained C3D network' and 'RMSprop' for training, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The training of the RMDN is performed using RMSprop with adaptive learning rate and gradient clipping. We start from a learning rate of 0.0003 and after 8 epochs it is reduced at each epoch with a decay factor of 0.95. The gradient is clipped with a threshold of 20. Dropout with a ratio of 0.5 is applied only on the hidden layer of the LSTM network before the MDN. We trained for 40 epochs, but training is stopped if there is no significative improvement of the loss. ... The number of components of the GMM C is fixed to 20 for all the experiments. ... For all the experiments, we used 20% of the training data as validation set to find the regularization parameter of the SVM. We searched the parameter space on a grid between 10 9 to 103 with a step of 10 1 2 . |