Multi-modal Dependency Tree for Video Captioning

Authors: Wentian Zhao, Xinxiao Wu, Jiebo Luo

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on several video captioning datasets demonstrate the effectiveness of the proposed method.
Researcher Affiliation Academia Wentian Zhao, Xinxiao Wu Beijing Laboratory of Intelligent Information Technology School of Computer Science Beijing Institute of Technology No. 5, Zhongguancun South Street, Beijing, China {wentian_zhao,wuxinxiao}@bit.edu.cn Jiebo Luo Department of Computer Science University of Rochester Rochester, NY 14627 USA jluo@cs.rochester.edu
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release) for its methodology.
Open Datasets Yes Activity Net Captions is a dense video captioning dataset that contains 10,030 training videos, 4,926 validation videos and 5,044 test videos. Charades Captions is composed of 9,223 videos of indoor activities. MSVD consists of 1,970 video clips collected from You Tube. MSR-VTT is a dataset collected for open-domain video captioning.
Dataset Splits Yes Activity Net Captions is a dense video captioning dataset that contains 10,030 training videos, 4,926 validation videos and 5,044 test videos. Following [15], we split this dataset into 6,963 training videos, 500 validation videos and 1,760 test videos. We follow [9] to split the dataset into 1,200 training videos, 100 validation videos and 670 testing videos. We use the splits provided by [15], where the training split, validatation split and test split contain 6,513 videos, 497 videos and 2,990 videos, respectively.
Hardware Specification Yes The experiments are conducted using one NVIDIA RTX 2080Ti GPU.
Software Dependencies No The paper mentions using 'the dependency parser in the Spa Cy toolkit' and the 'Adam optimizer' but does not provide specific version numbers for these or any other software components.
Experiment Setup Yes The initial learning rate is set to 5 10 5 and decays 0.8 times for every 3 epochs.