Story Realization: Expanding Plot Events into Sentences
Authors: Prithviraj Ammanabrolu, Ethan Tien, Wesley Cheung, Zhaochen Luo, William Ma, Lara J. Martin, Mark O. Riedl7375-7382
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide results including a human subjects study for a full end-to-end automated story generation system showing that our method generates more coherent and plausible stories than baseline approaches. We perform two sets of experiments, one set evaluating our models on the event-to-sentence problem by itself, and another set intended to evaluate the full storytelling pipeline. |
| Researcher Affiliation | Academia | Prithviraj Ammanabrolu, Ethan Tien, Wesley Cheung, Zhaochen Luo, William Ma, Lara J. Martin, Mark O. Riedl School of Interactive Computing Georgia Institute of Technology {raj.ammanabrolu, etien, wcheung8, zluo, wma61, ljmartin, riedl}@gatech.edu |
| Pseudocode | No | The paper describes various algorithms in prose but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code to reproduce our experiments is available at https://github.com/rajammanabrolu/StoryRealization |
| Open Datasets | No | The paper states that it 'scraped long-running science fiction TV show plot summaries from the fandom wiki service wikia.com' and then pre-processed and 'eventified' this data to create their corpus. However, it does not provide concrete access (link, DOI, repository, or citation) to this specific processed dataset they used for training and evaluation. |
| Dataset Splits | Yes | After the data is fully prepared, it is split in a 8:1:1 ratio to create the training, validation, and testing sets, respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like 'Stanford Parser' and 'AWD-LSTM' architecture but does not specify any version numbers for these or any other ancillary software components. |
| Experiment Setup | Yes | After the models are trained, we pick the cascading thresholds for the ensemble by running the validation set through each of the models and generating confidence scores. This is done by running a grid search through a limited set of thresholds such that the overall BLEU-4 score (Papineni et al. 2002) of the generated sentences in the validation set is maximized. These thresholds are then frozen when running the final set of evaluations on the test set. For the baseline sequence-to-sequence method, we decode our output with a beam size of 5. |