Story Realization: Expanding Plot Events into Sentences

Authors: Prithviraj Ammanabrolu, Ethan Tien, Wesley Cheung, Zhaochen Luo, William Ma, Lara J. Martin, Mark O. Riedl7375-7382

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide results including a human subjects study for a full end-to-end automated story generation system showing that our method generates more coherent and plausible stories than baseline approaches. We perform two sets of experiments, one set evaluating our models on the event-to-sentence problem by itself, and another set intended to evaluate the full storytelling pipeline.
Researcher Affiliation Academia Prithviraj Ammanabrolu, Ethan Tien, Wesley Cheung, Zhaochen Luo, William Ma, Lara J. Martin, Mark O. Riedl School of Interactive Computing Georgia Institute of Technology {raj.ammanabrolu, etien, wcheung8, zluo, wma61, ljmartin, riedl}@gatech.edu
Pseudocode No The paper describes various algorithms in prose but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Code to reproduce our experiments is available at https://github.com/rajammanabrolu/StoryRealization
Open Datasets No The paper states that it 'scraped long-running science fiction TV show plot summaries from the fandom wiki service wikia.com' and then pre-processed and 'eventified' this data to create their corpus. However, it does not provide concrete access (link, DOI, repository, or citation) to this specific processed dataset they used for training and evaluation.
Dataset Splits Yes After the data is fully prepared, it is split in a 8:1:1 ratio to create the training, validation, and testing sets, respectively.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions software like 'Stanford Parser' and 'AWD-LSTM' architecture but does not specify any version numbers for these or any other ancillary software components.
Experiment Setup Yes After the models are trained, we pick the cascading thresholds for the ensemble by running the validation set through each of the models and generating confidence scores. This is done by running a grid search through a limited set of thresholds such that the overall BLEU-4 score (Papineni et al. 2002) of the generated sentences in the validation set is maximized. These thresholds are then frozen when running the final set of evaluations on the test set. For the baseline sequence-to-sequence method, we decode our output with a beam size of 5.