reproducibilityindex.ai

Motion Guided Spatial Attention for Video Captioning

Authors: Shaoxiang Chen, Yu-Gang Jiang8191-8198

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on two benchmark datasets, MSVD and MSR-VTT. The experiments show that our designed model can generate better video representation and state of the art results are obtained under popular evaluation metrics such as BLEU@4, CIDEr, and METEOR.
Researcher Affiliation	Academia	Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University Shanghai Institute of Intelligent Electronics & Systems {sxchen13, ygj}@fudan.edu.cn
Pseudocode	No	The paper describes the architecture and computations in text and diagrams (Figure 2) but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	All the components of our model and training are implemented in Tensorﬂow3. 3https://github.com/tensorﬂow/tensorﬂow. The provided link is to the TensorFlow library, not the authors' specific implementation code for this paper. The paper does not state that their code is open source or available.
Open Datasets	Yes	The MSVD dataset (Chen and Dolan 2011) is a widely used benchmark dataset for video captioning methods. The MSR-VTT dataset (Xu et al. 2016) is a large scale open-domain video captioning dataset.
Dataset Splits	Yes	In our experiments, we follow the split settings in prior works (Xu et al. 2017; Yao et al. 2015): 1,200 videos for training, 100 videos for validation and 670 videos for testing. We follow the standard dataset split in the dataset paper: 6,513 video for training, 497 videos for validation and 2,990 videos for testing.
Hardware Specification	Yes	On a commodity GTX 1080 Ti GPU, the times needed to extract frame features and optical ﬂows for a typical 10-second video clip are 400ms and 800ms, respectively.
Software Dependencies	No	All the components of our model and training are implemented in Tensorﬂow3. The paper mentions TensorFlow but does not specify a version number or other software dependencies with versions.
Experiment Setup	Yes	The LSTMs used in our model all have 1024 hidden units and the word embedding size is set to 512. ... We apply dropout with rate of 0.5 to all the vertical connections of LSTMs and L2 regularization with a factor of 5 10 5 to all the trainable parameters to mitigate overﬁtting. We apply ADAM optimizer with a learning rate of 10 4 and batch size of 32 to minimize the negative log-likelihood loss.