Discriminative Feature Learning for Unsupervised Video Summarization

Authors: Yunjae Jung, Donghyeon Cho, Dahun Kim, Sanghyun Woo, In So Kweon8537-8544

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of the proposed methods by conducting extensive ablation studies and show that our final model achieves new state-of-the-art results on two benchmark datasets.
Researcher Affiliation Academia Yunjae Jung, Donghyeon Cho, Dahun Kim, Sanghyun Woo, In So Kweon Korea Advanced Institute of Science and Technology, Korea {yun9298a, cdh12242}@gmail.com, {mcahny, shwoo93, iskweon77}@kaist.ac.kr
Pseudocode No The paper describes the proposed methods in narrative text and illustrates network architectures with diagrams (e.g., Figure 1), but it does not include any formal pseudocode blocks or algorithm listings.
Open Source Code No The paper does not contain any explicit statements about the release of source code, nor does it provide a link to a code repository for the methodology described.
Open Datasets Yes We evaluate our approach on two benchmark datasets, Sum Me (Gygli et al. 2014) and TVSum (Song et al. 2015). ... OVP (De Avila et al. 2011) and You Tube (De Avila et al. 2011) datasets consist of 50 and 39 videos, respectively. We use OVP and You Tube datasets for transfer and augmented settings.
Dataset Splits Yes To divide the test set and the training set, we randomly extract the test set five times, 20% of the total. The remaining 80% of the videos is used for the training set. We use the final F-score, which is the average of the F-scores of the five tests. ... Table 1: Evaluation setting for Sum Me. In the case of TVSum, we switch between Sum Me and TVSum in the above table. Canonical 80% Sum Me 20% Sum Me
Hardware Specification No The paper mentions using 'Goog Le Net pool-5' for feature extraction and 'LSTM' as part of the network architecture, but it does not specify any hardware details like GPU models, CPU types, or other computing resources used for the experiments.
Software Dependencies No The paper states 'We implement our method using Pytorch.' However, it does not provide a specific version number for Pytorch or any other software dependencies, which is required for reproducibility.
Experiment Setup Yes For input features, we extract each frame by 2fps as in (Zhang et al. 2016b), and then obtain a feature with 1024 dimensions through Goog Le Net pool-5 (Szegedy et al. 2015) trained on Image Net (Russakovsky et al. 2015). The LSTM input and hidden size is 256 reduced by FC (1024 to 256) for fast convergence, and the weight is shared with each chunk and stride input. The maximum epoch is 20, the learning rate is 1e-4, and 0.1 times after 10 epochs. The weights of the network are randomly initialized. M in CSNet is experimentally picked as 4.