Video Summarization via Label Distributions Dual-Reward

Authors: Yongbiao Gao, Ning Xu, Xin Geng

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on several benchmark datasets show that our proposed method outperforms other approaches under various settings.
Researcher Affiliation Academia School of Computer Science and Engineering, Southeast University, Nanjing, China {gaoyb, xning, xgeng}@seu.edu.cn
Pseudocode No The paper describes algorithms but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes All the datasets, the code as well as the trained models have been be released 1. http://palm.seu.edu.cn/xgeng/
Open Datasets Yes We evaluate our approach on two widely used benchmark datasets, Sum Me [Song et al., 2015] and TVSum [Gygli et al., 2014]. ... we use other two datasets from You Tube [Avila et al., 2011] and Open Video Project (OVP) 2 as auxiliary datasets to conduct augmented and transfer experiments. Open video project. https://open-video.org/.
Dataset Splits Yes We use three settings to evaluate our method. (1) Canonical: we use the standard 5-fold cross validation (5FCV).
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions using 'Goog Le Net pre-trained on Image Net' and the 'deep deterministic policy gradient (DDPG) algorithm' but does not specify software versions for reproducibility.
Experiment Setup Yes The hyperparameters δ in Eq.5, α in Eq.10, τ in Eq.15 and 16, N in Eq.17 are 0.2, 0.3, 0.001 and 4, respectively. The learning rate is 1e 04 for actor and 1e 03 for critic. The batch size is 32. The discount factor γ is 0.99. And the size of the memory capacity is 10000 for TVSum and 5000 for Sum Me. The limited length of video summaries φ is 15% of the whole video length. ... The LSTM layer includes 128 units. The time step of LSTM is 10. The actor network has one label distribution layer. The label distribution layer has 5 units. The critic has two fully connected layers including 32 and 64 units for Sum Me dataset, 300 and 600 units for TVSum dataset respectively. The output layer of the critic network has 1 unit. The parameters ymin and ymax are 1 and 5 in Eq.1.