Active Video Summarization: Customized Summaries via On-line Interaction with the User

Authors: Ana Garcia del Molino, Xavier Boix, Joo-Hwee Lim, Ah-Hwee Tan

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate AVS in the commonly used UTEgo dataset. We also introduce a new dataset for customized video summarization (CSumm) recorded with a Google Glass. The results show that AVS achieves an excellent compromise between usability and quality. In 41% of the videos, AVS is considered the best over all tested baselines, including summaries manually generated.
Researcher Affiliation Academia Ana Garcia del Molino, Xavier Boix, Joo-Hwee Lim, Ah-Hwee Tan Institute for Infocomm Research, A*STAR, Singapore School of Computer Science and Engineering, Nanyang Technological University, Singapore LCSL, Massachusetts Institute of Technology and Istituto Italiano di Tecnologia, MA Center for Brains, Minds, and Machines, Mc Govern Institute for Brain Research, Massachusetts Institute of Technology, MA {stugdma,joohwee}@i2r.a-star.edu.sg, xboix@mit.edu, asahtan@ntu.edu.sg
Pseudocode Yes Alg. 1: Active Summarization
Open Source Code No The paper does not provide any concrete access information for its source code, nor does it explicitly state that the code will be made open source or is available in supplementary materials.
Open Datasets Yes We evaluate AVS on two challenging datasets for video summarization: UTEgo (Lee, Ghosh, and Grauman 2012), which is a commonly used egocentric video dataset, and CSumm, a new dataset for customizable video summarization that we introduce. CSumm contains single-shot unconstrained videos of long duration recorded with a Google Glass
Dataset Splits No The paper does not provide specific details on training/validation/test splits, such as percentages, sample counts, or explicit citation to predefined splits for reproducibility. It describes general evaluation methodologies like user studies but not data partitioning for model training/validation.
Hardware Specification No The paper mentions 'Google Glass' and 'Looxcie camera' as recording devices and the use of a 'neural network' (Alex Net) which implies computational hardware, but it does not specify any particular GPU models, CPU models, or other hardware used for running the experiments or training the models.
Software Dependencies No The paper mentions using 'Alex Net' and a 'Belief Propagation (BP)' implementation, but it does not provide specific version numbers for any software libraries, frameworks, or solvers used in the experiments.
Experiment Setup Yes We set α = 5, β = 1 and γ = 1. ... In the active interaction phase, the multiplier K is set to 5. ... Qi is increased or decreased by Δ, which is set to 100 to ensure that the segments selected by the user appear in the summary, and the discarded do not. ... The duration is variable depending on the length of the original video. It is set to be around 0.1% of the video length, with a minimum of 10 seconds.