reproducibilityindex.ai

Story Realization: Expanding Plot Events into Sentences

Authors: Prithviraj Ammanabrolu, Ethan Tien, Wesley Cheung, Zhaochen Luo, William Ma, Lara J. Martin, Mark O. Riedl7375-7382

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide results including a human subjects study for a full end-to-end automated story generation system showing that our method generates more coherent and plausible stories than baseline approaches. We perform two sets of experiments, one set evaluating our models on the event-to-sentence problem by itself, and another set intended to evaluate the full storytelling pipeline.
Researcher Affiliation	Academia	Prithviraj Ammanabrolu, Ethan Tien, Wesley Cheung, Zhaochen Luo, William Ma, Lara J. Martin, Mark O. Riedl School of Interactive Computing Georgia Institute of Technology {raj.ammanabrolu, etien, wcheung8, zluo, wma61, ljmartin, riedl}@gatech.edu
Pseudocode	No	The paper describes various algorithms in prose but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code to reproduce our experiments is available at https://github.com/rajammanabrolu/StoryRealization
Open Datasets	No	The paper states that it 'scraped long-running science ﬁction TV show plot summaries from the fandom wiki service wikia.com' and then pre-processed and 'eventiﬁed' this data to create their corpus. However, it does not provide concrete access (link, DOI, repository, or citation) to this specific processed dataset they used for training and evaluation.
Dataset Splits	Yes	After the data is fully prepared, it is split in a 8:1:1 ratio to create the training, validation, and testing sets, respectively.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions software like 'Stanford Parser' and 'AWD-LSTM' architecture but does not specify any version numbers for these or any other ancillary software components.
Experiment Setup	Yes	After the models are trained, we pick the cascading thresholds for the ensemble by running the validation set through each of the models and generating conﬁdence scores. This is done by running a grid search through a limited set of thresholds such that the overall BLEU-4 score (Papineni et al. 2002) of the generated sentences in the validation set is maximized. These thresholds are then frozen when running the ﬁnal set of evaluations on the test set. For the baseline sequence-to-sequence method, we decode our output with a beam size of 5.