Weakly Supervised Dense Event Captioning in Videos
Authors: Xuguang Duan, Wenbing Huang, Chuang Gan, Jingdong Wang, Wenwu Zhu, Junzhou Huang
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results are provided to demonstrate the ability of our model on both dense event captioning and sentence localization in videos. |
| Researcher Affiliation | Collaboration | 1 Tsinghua University, Beijing, China; 2 Tencent AI Lab. ; 3 MIT-IBM Watson AI Lab; 4 Microsoft Research Asia, Beijing, China; |
| Pseudocode | No | The paper describes the methods in text and uses mathematical formulations, but it does not include any pseudocode blocks or algorithms labeled as such. |
| Open Source Code | Yes | Details about training are provided in the Supplementary materials and our Github repository. |
| Open Datasets | Yes | We conduct experiments on the Activity Net Captions[10] dataset that has been applied as the benchmark for dense video captioning. |
| Dataset Splits | Yes | We follow the suggested protocol by [10, 11] to use 50% of the videos for training, 25% for validation, and 25% for testing. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | Yes | Our code is implemented by Pytorch-0.3. |
| Experiment Setup | Yes | The trade-off parameters in our loss, i.e., λs and λa are both set to 0.1. We train our model by using the stochastic gradient descent with the initial learning rate as 0.01 and momentum factor as 0.9. |