Neural Variational Inference for Text Processing

Authors: Yishu Miao, Lei Yu, Phil Blunsom

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate this framework on two very different text modelling applications, generative document modelling and supervised question answering. Our neural variational document model combines a continuous stochastic document representation with a bagof-words generative model and achieves the lowest reported perplexities on two standard test corpora. The neural answer selection model employs a stochastic representation layer within an attention mechanism to extract the semantics between a question and answer pair. On two question answering benchmarks this model exceeds all previous published benchmarks.
Researcher Affiliation Collaboration 1University of Oxford, 2Google Deepmind
Pseudocode No The paper describes the models and framework in detail, but it does not include a specific pseudocode block or algorithm section.
Open Source Code No The paper does not provide any links to source code or explicitly state that source code is being released.
Open Datasets Yes We experiment with NVDM on two standard news corpora: the 20News Groups2 and the Reuters RCV1-v23. The former is a collection of newsgroup documents, consisting of 11,314 training and 7,531 test articles. The latter is a large collection from Reuters newswire stories with 794,414 training and 10,000 test cases. ... 2http://qwone.com/~jason/20Newsgroups 3http://trec.nist.gov/data/reuters/reuters.html
Dataset Splits Yes The model is trained by Adam (Kingma & Ba, 2015) and tuned by hold-out validation perplexity. ... with hyperparameters selected by optimising the MAP score on the development set.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for the experiments. It only vaguely mentions 'on GPU'.
Software Dependencies No The paper mentions 'Adam' as the optimizer and the 'word2vec tool' but does not specify version numbers for any key software components or libraries used for implementation.
Experiment Setup Yes We train NVDM models with 50 and 200 dimensional document representations respectively. For the inference network, we use an MLP (Eq. 8) with 2 layers and 500 dimension rectifier linear units... During training we carry out stochastic estimation by taking one sample for estimating the stochastic gradients, while in prediction we use 20 samples for predicting document perplexity. The model is trained by Adam (Kingma & Ba, 2015) and tuned by hold-out validation perplexity. ... We use LSTMs with 3 layers and 50 hidden units, and apply 40% dropout after the embedding layer. For the construction of the inference network, we use an MLP (Eq. 10) with 2 layers and tanh units of 50 dimension, and an MLP (Eq. 17) with 2 layers and tanh units of 150 dimension for modelling the joint representation. ... The models are trained using Adam (Kingma & Ba, 2015), with hyperparameters selected by optimising the MAP score on the development set.