An Efficient Framework for Dense Video Captioning

Authors: Maitreya Suin, A. N. Rajagopalan12039-12046

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive evaluations on Activity Net captions dataset to validate our method.
Researcher Affiliation Academia Maitreya Suin, A. N. Rajagopalan Indian Institute of Technology Madras maitreyasuin21@gmail.com, raju@ee.iitm.ac.in
Pseudocode No The paper describes methods and architectures verbally and mathematically but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a link to open-source code or explicitly state that code for the described methodology is available.
Open Datasets Yes Activity Net Captions (Krishna et al. 2017) is one of the largest datasets containing multiple annotated temporal event segments and corresponding natural language sentence describing those events.
Dataset Splits Yes It contains almost 20,000 You Tube videos which include 10,024, 4,926 and 5,044 videos for training, validation and test splits, respectively.
Hardware Specification No The paper mentions using Res Net-200 for feature extraction and discusses computational costs (GFLOPs), but it does not specify any particular hardware components (e.g., GPU, CPU models, or memory) used for running the experiments.
Software Dependencies No The paper mentions Adam (Kingma and Ba 2014) as an optimizer but does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We leverage Adam (Kingma and Ba 2014) with an initial learning rate of 0.001. We apply the well-known regularization technique Dropout (Srivastava et al. 2014) to regularize the training and prevent over-fitting.