reproducibilityindex.ai

Set Prediction Guided by Semantic Concepts for Diverse Video Captioning

Authors: Yifan Lu, Ziqi Zhang, Chunfeng Yuan, Peng Li, Yan Wang, Bing Li, Weiming Hu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on benchmark datasets show that the proposed SCGSP achieves state-of-the-art (SOTA) performance under both relevance and diversity metrics.
Researcher Affiliation	Collaboration	1State Key Laboratory of Multimodal Artificial Intelligence Systems, CASIA 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3Alibaba Group 4Zhejiang Linkheer Science and Technology Co., Ltd. 5School of Information Science and Technology, Shanghai Tech University
Pseudocode	No	The paper describes the model architecture and process in detail but does not include a formal pseudocode block or algorithm.
Open Source Code	No	The paper does not contain an explicit statement about releasing the code for the described methodology, nor does it provide a direct link to a source-code repository.
Open Datasets	Yes	Datasets MSVD contains 1970 video from You Tube. Each video is annotated with 41 captions on average. We follow the split of 1200/100/670 for training, validation, and test. MSRVTT contains 10000 open domain videos. Each video is annotated with 20 captions. We follow the split of 6513/497/2990 for training, validation, and test. VATEX contains 34991 videos, each with 10 English captions. We follow the split of 25991/3000/6000 for training, validation, and test.
Dataset Splits	Yes	We follow the split of 1200/100/670 for training, validation, and test [for MSVD]. We follow the split of 6513/497/2990 for training, validation, and test [for MSRVTT]. We follow the split of 25991/3000/6000 for training, validation, and test [for VATEX].
Hardware Specification	Yes	The model is implemented with Py Torch, and all the experiments are conducted on 1 RTX 3090 GPU.
Software Dependencies	No	The paper mentions 'Py Torch' and 'GPT-2' but does not specify their version numbers or the version numbers of any other key software dependencies.
Experiment Setup	Yes	The prefix length is set to 10. The weights of loss terms are set as λ = 1 and λd = 0.5. We apply Adam W as the optimizer. The learning rate and batch size are set to 8e 5 and 32 for SCG-SP-LSTM, 1e 5 and 8 for SCG-SP-Prefix. We use beam search with size 3 for generation at the inference stage.