Deep Reinforcement Learning for Unsupervised Video Summarization With Diversity-Representativeness Reward

Authors: Kaiyang Zhou, Yu Qiao, Tao Xiang

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on two benchmark datasets show that our unsupervised method not only outperforms other stateof-the-art unsupervised methods, but also is comparable to or even superior than most of published supervised approaches.
Researcher Affiliation Academia 1 Guangdong Key Lab of Computer Vision and Virtual Reality, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China 2 Queen Mary University of London, UK k.zhou@qmul.ac.uk, yu.qiao@siat.ac.cn
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Codes are available on https://github.com/Kaiyang Zhou/vsummreinforce
Open Datasets Yes We evaluate our methods on Sum Me (Gygli et al. 2014) and TVSum (Song et al. 2015). Sum Me consists of 25 user videos covering various topics such as holidays and sports. ... TVSum contains 50 videos, which include the topics of news, documentaries, etc. ... In addition to these two datasets, we exploit two other datasets, OVP1 that has 50 videos and You Tube (De Avila et al. 2011) that has 39 videos excluding cartoon videos... Open video project: https://open-video.org/.
Dataset Splits Yes We use three settings as suggested in (Zhang et al. 2016b) to evaluate our method. (1) Canonical: we use the standard 5-fold cross validation (5FCV), i.e., 80% of videos for training and the rest for testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running experiments.
Software Dependencies No The paper mentions 'Theano (Al-Rfou et al. 2016)2' but does not specify its version number or any other software dependencies with version numbers.
Experiment Setup Yes We set the temporal distance λ to 20, the ϵ in Eq. 11 to 0.5, and the number of episodes N to 5. The other hyperparameters α, β1 and β2 in Eq. 13 are optimized via cross-validation. We set the dimension of hidden state in the RNN cell to 256 throughout this paper. Training is stopped when it reaches a maximum number of epochs (60 in our case). Early stopping is executed when reward creases to increase for a period of time (10 epochs in our experiments).