Weakly Supervised Dense Event Captioning in Videos

Authors: Xuguang Duan, Wenbing Huang, Chuang Gan, Jingdong Wang, Wenwu Zhu, Junzhou Huang

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results are provided to demonstrate the ability of our model on both dense event captioning and sentence localization in videos.
Researcher Affiliation Collaboration 1 Tsinghua University, Beijing, China; 2 Tencent AI Lab. ; 3 MIT-IBM Watson AI Lab; 4 Microsoft Research Asia, Beijing, China;
Pseudocode No The paper describes the methods in text and uses mathematical formulations, but it does not include any pseudocode blocks or algorithms labeled as such.
Open Source Code Yes Details about training are provided in the Supplementary materials and our Github repository.
Open Datasets Yes We conduct experiments on the Activity Net Captions[10] dataset that has been applied as the benchmark for dense video captioning.
Dataset Splits Yes We follow the suggested protocol by [10, 11] to use 50% of the videos for training, 25% for validation, and 25% for testing.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies Yes Our code is implemented by Pytorch-0.3.
Experiment Setup Yes The trade-off parameters in our loss, i.e., λs and λa are both set to 0.1. We train our model by using the stochastic gradient descent with the initial learning rate as 0.01 and momentum factor as 0.9.