TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency

Authors: Adji B. Dieng, Chong Wang, Jianfeng Gao, John Paisley

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results on word prediction show that Topic RNN outperforms existing contextual RNN baselines. In addition, Topic RNN can be used as an unsupervised feature extractor for documents. We do this for sentiment analysis on the IMDB movie review dataset and report an error rate of 6.28%.
Researcher Affiliation Collaboration Adji B. Dieng Columbia University abd2141@columbia.edu Chong Wang Deep Learning Technology Center Microsoft Research chowang@microsoft.com Jianfeng Gao Deep Learning Technology Center Microsoft Research jfgao@microsoft.com John Paisley Columbia University jpaisley@columbia.edu
Pseudocode No The paper does not contain pseudocode or a clearly labeled algorithm block.
Open Source Code No Our code will be made publicly available for reproducibility.
Open Datasets Yes For word prediction we use the Penn Tree Bank dataset, a standard benchmark for assessing new language models (Marcus et al., 1993). For sentiment analysis we use the IMDB 100k dataset (Maas et al., 2011), also a common benchmark dataset for this application. These datasets are publicly available at http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz and http://ai.stanford.edu/~amaas/data/sentiment/.
Dataset Splits Yes We use the standard split, where sections 0-20 (930K tokens) are used for training, sections 21-22 (74K tokens) for validation, and sections 23-24 (82K tokens) for testing (Mikolov et al., 2010).
Hardware Specification Yes These experiments were ran on Microsoft Azure NC12 that has 12 cores, 2 Tesla K80 GPUs, and 112 GB memory. This experiment took close to 78 hours on a Mac Book pro quad-core with 16GHz of RAM.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes For these experiments, we used a multilayer perceptron with 2 hidden layers and 200 hidden units per layer for the inference network. The number of topics was tuned depending on the size of the RNN. For 10 neurons we used 18 topics. For 100 and 300 neurons we found 50 topics to be optimal. We used a maximum of 15 epochs for the experiments and performed early stopping using the validation set. For comparison purposes we did not apply dropout and used 1 layer for the RNN and its counterparts in all the word prediction experiments as reported in Table 2.