reproducibilityindex.ai

Dual Video Summarization: From Frames to Captions

Authors: Zhenzhen Hu, Zhenshan Wang, Zijie Song, Richang Hong

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiment results on the MSR-VTT and MSVD dataset reveal that, for the generative task as video captioning, a small number of keyframes can convey the same semantic information to perform well on captioning, or even better than the original sampling.
Researcher Affiliation	Academia	Zhenzhen Hu1,2 , Zhenshan Wang1 , Zijie Song1 and Richang Hong1 1Hefei University of Technology 2 Institute of Artificial Intelligence Hefei Comprehensive National Science Center
Pseudocode	No	The paper describes its framework and process in text and diagrams but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We evaluate our model on MSR-VTT [Xu et al., 2016] and MSVD [Chen and Dolan, 2011] datasets.
Dataset Splits	Yes	We split the data into a 6,513 training set, 497 validation set and 2,990 testing set. We follow the data split of 1,200 videos for training, 100 videos for validation and the rest for testing.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions the use of "Adam optimizer" and "pre-trained CLIP [Radford et al., 2021] with 12 layers Vi T-B/32" but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	Our summarizer module is trained with 10 epochs on the above datasets with learning rate 1e-3 and dropout 0.2. Our captioning module is trained with learning rate 1e-4 and 40 epochs, and we set the batch size to 32. Both the summarizer and captioning decoder employ Adam optimizer [Kingma and Ba, 2014] to minimize the loss.