Improved Variational Autoencoders for Text Modeling using Dilated Convolutions
Authors: Zichao Yang, Zhiting Hu, Ruslan Salakhutdinov, Taylor Berg-Kirkpatrick
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we find that there is a trade-off between contextual capacity of the decoder and effective use of encoding information. We demonstrate perplexity gains on two datasets, representing the first positive language modeling result with VAE. Further, we conduct an in-depth investigation of the use of VAE (with our new decoding architecture) for semi-supervised and unsupervised labeling tasks, demonstrating gains over several strong baselines. |
| Researcher Affiliation | Academia | 1Carnegie Mellon University. Correspondence to: Zichao Yang <zichaoy@cs.cmu.edu>. |
| Pseudocode | No | The paper describes the model architecture and training procedures in detail but does not provide pseudocode or a clearly labeled algorithm block. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We use two large scale document classification data sets: Yahoo Answer and Yelp15 review, representing topic classification and sentiment classification data sets respectively (Tang et al., 2015; Yang et al., 2016; Zhang et al., 2015). |
| Dataset Splits | Yes | The original data sets contain millions of samples, of which we sample 100k as training and 10k as validation and test from the respective partitions. |
| Hardware Specification | No | The paper discusses model configurations and training details but does not provide any specific hardware details such as GPU or CPU models, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper mentions using Adam for optimization but does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | We use a vocabulary size of 20k for both data sets and set the word embedding dimension to be 512. The LSTM dimension is 1024. ... We use Adam (Kingma & Ba, 2014) to optimize all models and the learning rate is selected from [2e-3, 1e-3, 7.5e-4] and β1 is selected from [0.5, 0.9]. ... We select drop out ratio of LSTMs (both encoder and decoder) from [0.3, 0.5]. ... We use batch size of 32 and all model are trained for 40 epochs. |