MAM-RNN: Multi-level Attention Model Based RNN for Video Captioning

Authors: Xuelong Li, Bin Zhao, Xiaoqiang Lu

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Practically, the experimental results on two benchmark datasets, i.e., MSVD and Charades, have shown the excellent performance of the proposed approach.
Researcher Affiliation Academia 1Xian Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xian 710119, P. R. China 2School of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xian 710072, P. R. China
Pseudocode No The paper describes its approach using mathematical equations (e.g., Equ. (1) - (19)) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about the availability of open-source code, nor does it include links to a code repository.
Open Datasets Yes The MSVD dataset [Guadarrama et al., 2013] is composed of 1970 video clips downloaded from the You Tube. Each video clip typically describes a single activity in open domain and is annotated with multi-lingual captions.
Dataset Splits Yes The MSVD dataset is split into a training set of 1200 videos, a validation set of 100 videos, and a testing set of the remaining 670 videos.
Hardware Specification No The paper mentions the use of CNNs like Vgg Net 16, Goog Le Net, and C3D for feature extraction, but does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions '300-dimensional GloVe vector [Pennington et al., 2014]' for text features and various CNNs, but does not provide specific version numbers for any software dependencies or libraries used.
Experiment Setup No The paper describes the model architecture, feature extraction process, and the training objective function (log-likelihood in Equ. 12), but it does not specify concrete hyperparameters such as learning rate, batch size, number of epochs, or optimizer settings for the experimental setup.