MAM-RNN: Multi-level Attention Model Based RNN for Video Captioning
Authors: Xuelong Li, Bin Zhao, Xiaoqiang Lu
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Practically, the experimental results on two benchmark datasets, i.e., MSVD and Charades, have shown the excellent performance of the proposed approach. |
| Researcher Affiliation | Academia | 1Xian Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xian 710119, P. R. China 2School of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xian 710072, P. R. China |
| Pseudocode | No | The paper describes its approach using mathematical equations (e.g., Equ. (1) - (19)) but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about the availability of open-source code, nor does it include links to a code repository. |
| Open Datasets | Yes | The MSVD dataset [Guadarrama et al., 2013] is composed of 1970 video clips downloaded from the You Tube. Each video clip typically describes a single activity in open domain and is annotated with multi-lingual captions. |
| Dataset Splits | Yes | The MSVD dataset is split into a training set of 1200 videos, a validation set of 100 videos, and a testing set of the remaining 670 videos. |
| Hardware Specification | No | The paper mentions the use of CNNs like Vgg Net 16, Goog Le Net, and C3D for feature extraction, but does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions '300-dimensional GloVe vector [Pennington et al., 2014]' for text features and various CNNs, but does not provide specific version numbers for any software dependencies or libraries used. |
| Experiment Setup | No | The paper describes the model architecture, feature extraction process, and the training objective function (log-likelihood in Equ. 12), but it does not specify concrete hyperparameters such as learning rate, batch size, number of epochs, or optimizer settings for the experimental setup. |