Streaming Non-Monotone Submodular Maximization: Personalized Video Summarization on the Fly

Authors: Baharan Mirzasoleiman, Stefanie Jegelka, Andreas Krause

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that for video summarization, our method runs more than 1700 times faster than previous work, while maintaining practically the same performance. In this section, we apply STREAMING LOCAL SEARCH to video summarization in the streaming setting. The main goal of this section is to validate our theoretical results and demonstrate the effectiveness of our method in practical scenarios, where the existing streaming algorithms are incapable of providing any quality guarantee for the solutions. Table 1: Performance of various video summarization methods with segment size 10 on You Tube and OVP datasets, measured by F-Score (F), Precision (P), and Recall (R).
Researcher Affiliation Academia Baharan Mirzasoleiman ETH Zurich, Switzerland baharanm@ethz.ch Stefanie Jegelka MIT, United States stefje@mit.edu Andreas Krause ETH Zurich, Switzerland krausea@ethz.ch
Pseudocode Yes Algorithm 1 STREAMING LOCAL SEARCH for a collection of independence systems; Algorithm 2 STREAMING LOCAL SEARCH for independence systems I and d knapsacks
Open Source Code Yes Our code is available at github.com/baharanm/non-mon-stream
Open Datasets Yes For our experiments, we use the Open Video Project (OVP), and the You Tube datasets with 50 and 39 videos, respectively (De Avila et al. 2011).
Dataset Splits No The parameters, U and W or just W, are learned on 80% of the videos, selected uniformly at random. By the construction of (Gong et al. 2014), we have det(L) > 0. However, det(L) can take values less than 1, and the function is non-monotone. We added a positive constant to the function values to make them non-negative. Following Gong et al. (2014) for evaluation, we treat each of the 5 human-created summaries per video as ground truth for each video.
Hardware Specification No No specific hardware details (e.g., CPU, GPU models, memory specifications, or cloud instances) used for running experiments are mentioned in the paper.
Software Dependencies No The paper mentions algorithms and models used (e.g., 'linear transformation', 'one-hidden-layer neural network', 'Viola-Jones algorithm', 'multiclass support vector machine'), but does not provide specific version numbers for any software libraries, frameworks, or programming languages used for implementation.
Experiment Setup Yes For parametrization, we follow (Gong et al. 2014), and use both a linear transformation, i.e. Lij = v T i W T Wvj, as well as a non-linear transformation using a one-hidden-layer neural network, i.e. Lij = z T i W T Wzj where zi = tanh(Uvi), and tanh(.) stands for the hyperbolic transfer function. The parameters, U and W or just W, are learned on 80% of the videos, selected uniformly at random. Table 1: Performance of various video summarization methods with segment size 10 on You Tube and OVP datasets, measured by F-Score (F), Precision (P), and Recall (R). In our second experiment, we show how constraints can be applied to generate customized summaries...Here we considered 3 uniform matroid constraints to limit the number of frames chosen containing each of the judges, i.e., I ={S V : |S Vj| lj}, where Vj V is the subset of frames containing judge j, and j [1, 3]; the Vj can overlap. The limits for all the matroid constraints are lj = 3. Finally, we also compared the three algorithms with a standard , non-sequential DPP as the utility function, for generating summaries of length 5% of the video length.