Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Weakly Supervised Dense Event Captioning in Videos
Authors: Xuguang Duan, Wenbing Huang, Chuang Gan, Jingdong Wang, Wenwu Zhu, Junzhou Huang
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results are provided to demonstrate the ability of our model on both dense event captioning and sentence localization in videos. |
| Researcher Affiliation | Collaboration | 1 Tsinghua University, Beijing, China; 2 Tencent AI Lab. ; 3 MIT-IBM Watson AI Lab; 4 Microsoft Research Asia, Beijing, China; |
| Pseudocode | No | The paper describes the methods in text and uses mathematical formulations, but it does not include any pseudocode blocks or algorithms labeled as such. |
| Open Source Code | Yes | Details about training are provided in the Supplementary materials and our Github repository. |
| Open Datasets | Yes | We conduct experiments on the Activity Net Captions[10] dataset that has been applied as the benchmark for dense video captioning. |
| Dataset Splits | Yes | We follow the suggested protocol by [10, 11] to use 50% of the videos for training, 25% for validation, and 25% for testing. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | Yes | Our code is implemented by Pytorch-0.3. |
| Experiment Setup | Yes | The trade-off parameters in our loss, i.e., λs and λa are both set to 0.1. We train our model by using the stochastic gradient descent with the initial learning rate as 0.01 and momentum factor as 0.9. |