Movie Summarization via Sparse Graph Construction

Authors: Pinelopi Papalampidi, Frank Keller, Mirella Lapata13631-13639

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We summarize full-length movies by creating shorter videos containing their most informative scenes. We explore the hypothesis that a summary can be created by assembling scenes which are turning points (TPs), i.e., key events in a movie that describe its storyline. We propose a model that identifies TP scenes by building a sparse movie graph that represents relations between scenes and is constructed using multimodal information1. According to human judges, the summaries created by our approach are more informative and complete, and receive higher ratings, than the outputs of sequence-based models and general-purpose summarization algorithms. Our experiments were designed to answer three questions: (1) Is the proposed graph-based model better at identifying TPs compared to less structure-aware variants? (2) To what extent are graphs and multimodal information helpful? and (3) Are the summaries produced by automatically identified TPs meaningful? Table 2 addresses our first question. We perform 5-fold cross-validation over 38 gold-standard movies to obtain a test-development split and evaluate model performance in terms of three metrics: Total Agreement (TA), i.e., the percentage of TP scenes that are correctly identified, Partial Agreement (PA), i.e., the percentage of TP events for which at least one gold-standard scene is identified, and Distance (D)...
Researcher Affiliation Academia Pinelopi Papalampidi, Frank Keller and Mirella Lapata Institute for Language, Cognition and Computation, School of Informatics, University of Edinburgh p.papalampidi@sms.ed.ac.uk, {keller,mlap}@inf.ed.ac.uk
Pseudocode No The paper describes its model and process using natural language and diagrams, but it does not provide any formal pseudocode or algorithm blocks.
Open Source Code Yes 1We make our data and code publicly available at https://github. com/ppapalampidi/Graph TP.
Open Datasets Yes We performed experiments on the TRIPOD dataset 4 (Papalampidi, Keller, and Lapata 2019) originally used for analyzing the narrative structure of movies. We augmented this dataset by collecting gold-standard annotations for 23 new movies which we added to the test set. The resulting dataset contains 17,150 scenes from 122 movies, 38 of which have gold-standard scene-level annotations and were used for evaluation purposes. 4https://github.com/ppapalampidi/TRIPOD
Dataset Splits Yes We perform 5-fold cross-validation over 38 gold-standard movies to obtain a test-development split and evaluate model performance in terms of three metrics: Total Agreement (TA), i.e., the percentage of TP scenes that are correctly identified, Partial Agreement (PA), i.e., the percentage of TP events for which at least one gold-standard scene is identified, and Distance (D), i.e., the minimum distance in number of scenes between the predicted and gold-standard set of scenes for a given TP, normalized by the screenplay length (see Appendix for a more detailed definition of the evaluation metrics).
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions software like PyTorch, PyTorch Geometric, Universal Sentence Encoder, ResNeXt-101, and YAMNet, but it does not specify explicit version numbers for these software dependencies.
Experiment Setup Yes For training our model we set the hyperparameter λ in Eq. (6) to 10. We used the Adam algorithm (Kingma and Ba 2014) for optimizing our networks. We chose an LSTM with 64 neurons for encoding scenes in the screenplay and an identical one for contextualizing them. We also added a dropout of 0.2. Moreover, we set the maximum size of neighbors C that can be selected for a scene in graph G to 6.