Video Captioning with Tube Features

Authors: Bin Zhao, Xuelong Li, Xiaoqiang Lu

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on two benchmark datasets: MSVD and Charades. The experimental results have demonstrated the effectiveness of tube feature in the video captioning task.
Researcher Affiliation Academia Bin Zhao1, Xuelong Li2, Xiaoqiang Lu2 1School of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi an 710072, P. R. China 2Xi an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi an 710119, P. R. China
Pseudocode No The paper describes the procedures and calculations using mathematical equations and textual descriptions, but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statement about releasing open-source code for the described methodology or a link to a code repository.
Open Datasets Yes Datasets The MSVD [Guadarrama et al., 2013] dataset contains 1970 video clips collected from You Tube.
Dataset Splits Yes In this paper, the dataset is split into three sets, 1200 videos for training, 100 videos for validation, and the remaining 670 videos for testing.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions various models and datasets like Faster-RCNN, VGG16, Resnet-101, PASCAL VOC, Micro Soft COCO, and GloVe, but does not provide specific version numbers for any software libraries or dependencies used for implementation.
Experiment Setup Yes Following existing approaches [Venugopalan et al., 2015a], T is fixed as 80 in this paper, which is bigger than the longest caption.