reproducibilityindex.ai

Recurrent Mixture Density Network for Spatiotemporal Visual Attention

Authors: Loris Bazzani, Hugo Larochelle, Lorenzo Torresani

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on Hollywood2 show state-of-the-art performance on saliency prediction for video. We also show that our attentional model trained on Hollywood2 generalizes well to UCF101 and it can be leveraged to improve action classiﬁcation accuracy on both datasets.
Researcher Affiliation	Collaboration	Loris Bazzani Amazon Berlin, Germany bazzanil@amazon.com Hugo Larochelle D epartement d informatique Universit e de Sherbrooke hugo.larochelle@usherbrooke.ca Lorenzo Torresani Department of Computer Science Dartmouth College lt@dartmouth.edu
Pseudocode	No	The paper provides equations and diagrams of the model architecture but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link for the open-sourcing of the code for their described methodology.
Open Datasets	Yes	Therefore, we used the Hollywood2 dataset, which was augmented with eye tracking data by Mathe & Sminchisescu (2015). ... and UCF101 (Soomro et al., 2012).
Dataset Splits	Yes	We use a validation set consisting of 20% of the training set. We use the remaining 80% of the training data to learn our models, and use the hold-out validation set to choose the hyperpameters of our model.
Hardware Specification	Yes	All the experiments were carried out using an NVIDIA Tesla K40 card.
Software Dependencies	No	The paper mentions using a 'pretrained C3D network' and 'RMSprop' for training, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	The training of the RMDN is performed using RMSprop with adaptive learning rate and gradient clipping. We start from a learning rate of 0.0003 and after 8 epochs it is reduced at each epoch with a decay factor of 0.95. The gradient is clipped with a threshold of 20. Dropout with a ratio of 0.5 is applied only on the hidden layer of the LSTM network before the MDN. We trained for 40 epochs, but training is stopped if there is no signiﬁcative improvement of the loss. ... The number of components of the GMM C is ﬁxed to 20 for all the experiments. ... For all the experiments, we used 20% of the training data as validation set to ﬁnd the regularization parameter of the SVM. We searched the parameter space on a grid between 10 9 to 103 with a step of 10 1 2 .