Multi-modal Dependency Tree for Video Captioning
Authors: Wentian Zhao, Xinxiao Wu, Jiebo Luo
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on several video captioning datasets demonstrate the effectiveness of the proposed method. |
| Researcher Affiliation | Academia | Wentian Zhao, Xinxiao Wu Beijing Laboratory of Intelligent Information Technology School of Computer Science Beijing Institute of Technology No. 5, Zhongguancun South Street, Beijing, China {wentian_zhao,wuxinxiao}@bit.edu.cn Jiebo Luo Department of Computer Science University of Rochester Rochester, NY 14627 USA jluo@cs.rochester.edu |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release) for its methodology. |
| Open Datasets | Yes | Activity Net Captions is a dense video captioning dataset that contains 10,030 training videos, 4,926 validation videos and 5,044 test videos. Charades Captions is composed of 9,223 videos of indoor activities. MSVD consists of 1,970 video clips collected from You Tube. MSR-VTT is a dataset collected for open-domain video captioning. |
| Dataset Splits | Yes | Activity Net Captions is a dense video captioning dataset that contains 10,030 training videos, 4,926 validation videos and 5,044 test videos. Following [15], we split this dataset into 6,963 training videos, 500 validation videos and 1,760 test videos. We follow [9] to split the dataset into 1,200 training videos, 100 validation videos and 670 testing videos. We use the splits provided by [15], where the training split, validatation split and test split contain 6,513 videos, 497 videos and 2,990 videos, respectively. |
| Hardware Specification | Yes | The experiments are conducted using one NVIDIA RTX 2080Ti GPU. |
| Software Dependencies | No | The paper mentions using 'the dependency parser in the Spa Cy toolkit' and the 'Adam optimizer' but does not provide specific version numbers for these or any other software components. |
| Experiment Setup | Yes | The initial learning rate is set to 5 10 5 and decays 0.8 times for every 3 epochs. |