T-CVAE: Transformer-Based Conditioned Variational Autoencoder for Story Completion
Authors: Tianming Wang, Xiaojun Wan
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform experiments on the benchmark ROCStories dataset. Our model strongly outperforms prior methods and achieves the state-of-the-art performance. Both automatic and manual evaluations show that our model generates better story plots than stateof-the-art models in terms of readability, diversity and coherence. |
| Researcher Affiliation | Academia | Tianming Wang and Xiaojun Wan Institute of Computer Science and Technology, Peking University The MOE Key Laboratory of Computational Linguistics, Peking University {wangtm, wanxiaojun}@pku.edu.cn |
| Pseudocode | No | The paper describes models and formulas but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/sodawater/T-CVAE. |
| Open Datasets | Yes | We perform experiments on the ROCStory dataset for evaluating models. The dataset is randomly split by 8:1:1 to get the training, validation and test datasets with 78529, 9817 and 9816 stories respectively. [Mostafazadeh et al., 2016a] |
| Dataset Splits | Yes | The dataset is randomly split by 8:1:1 to get the training, validation and test datasets with 78529, 9817 and 9816 stories respectively. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using 'Adam Optimizer' and '300-dimensional Glove word vectors' but does not specify version numbers for these or any other software dependencies, such as programming languages or libraries. |
| Experiment Setup | Yes | We set our model parameters based on preliminary experiments on the development data. For all models including baselines, dmodel is set to 512 and demb is set to 300. For Transformer models, the head of attention H is set to 8 and the number of Transformer blocks L is set to 6. The number of LSTM layers is set to 2. For VAE models, dz is set to 64 and the annealing step is set to 20000. We apply dropout to the output of each sub-layer in Transformer blocks. We use a rate Pdrop = 0.15 for all models. We use the Adam Optimizer with an initial learning rate of 10 4, momentum β1 = 0.9, β2 = 0.99 and weight decay ϵ = 10 9. The batch size is set to 64. |