Towards Coherent and Consistent Use of Entities in Narrative Generation

Authors: Pinelopi Papalampidi, Kris Cao, Tomas Kocisky

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental First, we propose a set of automatic metrics for measuring model performance in terms of entity usage. Given these metrics, we quantify the limitations of current LMs. Next, we propose augmenting a pre-trained LM with a dynamic entity memory in an end-to-end manner by using an auxiliary entity-related loss for guiding the reads and writes to the memory. We demonstrate that the dynamic entity memory increases entity coherence according to both automatic and human judgment and helps preserving entity-related information especially in settings with a limited context window. Finally, we also validate that our automatic metrics are correlated with human ratings and serve as a good indicator of the quality of generated stories.
Researcher Affiliation Collaboration 1University of Edinburgh, UK 2Deep Mind, UK.
Pseudocode No The paper describes the model architecture and equations (Equations 1-6) but does not provide a formal pseudocode or algorithm block.
Open Source Code No The paper does not contain an explicit statement about releasing the source code for their methodology or a link to a code repository.
Open Datasets Yes 1. Writing Prompts (Wr Pr) (Fan et al., 2019): This dataset consists of Reddit stories, written by anonymous authors. 2. Wiki Plots (Wi Pl):1 https://github.com/markriedl/Wiki Plots
Dataset Splits Yes In our main experimental setup we compare the (base) VANILLALM4 with our model variants MNEMELM augmented with a static or dynamic entity memory. All models have access to a long enough context (considering both the current context and the T-XL memory) in order to fit the entity prompt and the whole narrative. However, we also consider experimental settings where the models have access to a limited narrative context...We set the sequence length to 512, and the T-XL memory to 500, having a total context window of 1012. Next, we start decreasing the T-XL memory for simulating scenarios with a limited narrative context, and investigate sizes in the set: [100, 50, 10].
Hardware Specification Yes Finally, we use 32 TPU-v3 chips for training our models for 450k steps and 1 TPU-v3 chip for evaluation, when the batch size per core is 2.
Software Dependencies No The paper mentions tools and models like Transformer-XL, nucleus sampling, and an end-to-end coreference tool (Lee et al., 2018), but does not provide specific version numbers for software dependencies.
Experiment Setup Yes For generating stories, we use nucleus sampling with p=0.8 and temperature 1. See Appendix A.2 for details...For updating the entity memory in D-MNEMELM, we consider intervals of 64 tokens in the narrative per update. Moreover, we set the temperature in Equation 4 to 0.1 for encouraging the model to produce distinct representations for different entity memory slots...In our main experimental setting, where we consider that all LMs have access to the full narrative context, we set the sequence length to 512, and the T-XL memory to 500, having a total context window of 1012...For generating stories for all models, we use nucleus sampling with p = 80 and temperature equals to 1. Finally, we use 32 TPU-v3 chips for training our models for 450k steps and 1 TPU-v3 chip for evaluation, when the batch size per core is 2.