Dispersed Exponential Family Mixture VAEs for Interpretable Text Generation
Authors: Wenxian Shi, Hao Zhou, Ning Miao, Lei Li
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our approach does obtain a meaningful space, and it outperforms strong baselines in text generation benchmarks. |
| Researcher Affiliation | Industry | 1Byte Dance AI lab. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | Yes | The code is available at https: //github.com/wenxianxian/demvae. |
| Open Datasets | Yes | For generation quality, we use the Penn Treebanks (Marcus et al., 1993, PTB) pre-processed by Mikolov (Mikolov et al., 2010) as the benchmark. For interpretability, we use the Daily Dialogs (Li et al., 2017b, DD) and the Stanford Multi Domain Dialog (Eric et al., 2017, SMD) datasets. |
| Dataset Splits | Yes | All hyper-parameters including β are chosen according to the reverse perplexity (language generation task) or BLEU scores (dialog generation task) in the validation set. The test set of PTB is also included for comparison of text fluency. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like GRU and Gumbel-softmax, and uses Glo Ve word embeddings, but it does not specify version numbers for any key software libraries or frameworks needed to replicate the experiment. |
| Experiment Setup | Yes | The encoder and decoder in all models are implemented with single-layer GRU (Chung ets al., 2014), with the hidden size as 512. For VAEs with discrete latent variables, multiple independent variables are adopted in order to increase model capacity. For unsupervised text generation, the dimension of discrete latent variables is set to 5 while the number of discrete latent variables is set to 20, 3 and 3 for PTB, DD, and SMD. The total dimension of continuous latent space is set to 40 for PTB, 15 for DD and 48 for SMD. For supervised text generation, the discrete variable number is 30, the dimension of each variable is set to 8 and the number of mixture components is set to 30. KL annealing with logistic weight function 1/(1 + exp( 0.0025(step 2500))) is adopted for all VAE variants. For GM-VAE, the KL annealing is applied in the whole KL term. All hyper-parameters including β are chosen according to the reverse perplexity (language generation task) or BLEU scores (dialog generation task) in the validation set. |