T-CVAE: Transformer-Based Conditioned Variational Autoencoder for Story Completion

Authors: Tianming Wang, Xiaojun Wan

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform experiments on the benchmark ROCStories dataset. Our model strongly outperforms prior methods and achieves the state-of-the-art performance. Both automatic and manual evaluations show that our model generates better story plots than stateof-the-art models in terms of readability, diversity and coherence.
Researcher Affiliation Academia Tianming Wang and Xiaojun Wan Institute of Computer Science and Technology, Peking University The MOE Key Laboratory of Computational Linguistics, Peking University {wangtm, wanxiaojun}@pku.edu.cn
Pseudocode No The paper describes models and formulas but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/sodawater/T-CVAE.
Open Datasets Yes We perform experiments on the ROCStory dataset for evaluating models. The dataset is randomly split by 8:1:1 to get the training, validation and test datasets with 78529, 9817 and 9816 stories respectively. [Mostafazadeh et al., 2016a]
Dataset Splits Yes The dataset is randomly split by 8:1:1 to get the training, validation and test datasets with 78529, 9817 and 9816 stories respectively.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using 'Adam Optimizer' and '300-dimensional Glove word vectors' but does not specify version numbers for these or any other software dependencies, such as programming languages or libraries.
Experiment Setup Yes We set our model parameters based on preliminary experiments on the development data. For all models including baselines, dmodel is set to 512 and demb is set to 300. For Transformer models, the head of attention H is set to 8 and the number of Transformer blocks L is set to 6. The number of LSTM layers is set to 2. For VAE models, dz is set to 64 and the annealing step is set to 20000. We apply dropout to the output of each sub-layer in Transformer blocks. We use a rate Pdrop = 0.15 for all models. We use the Adam Optimizer with an initial learning rate of 10 4, momentum β1 = 0.9, β2 = 0.99 and weight decay ϵ = 10 9. The batch size is set to 64.