Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings

Authors: Rie Johnson, Tong Zhang

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We report performances exceeding the previous best results on four benchmark datasets.
Researcher Affiliation Industry Rie Johnson RIEJOHNSON@GMAIL.COM RJ Research Consulting, Tarrytown NY, USA Tong Zhang TONGZHANG@BAIDU.COM Big Data Lab, Baidu Inc, Beijing, China
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code and experimental details are available at http://riejohnson.com/cnn download.html.
Open Datasets Yes We used four datasets: IMDB, Elec, RCV1 (second-level topics), and 20-newsgroup (20NG)... The first three were used in JZ15. IMDB and 20NG were used in DL15. The datasets are summarized in Table 2. IMDB 75K (20M words) Provided Elec 200K (24M words) Provided RCV1 669K (183M words) Sept 96 June 97 Table 4. Unlabeled data. See JZ15b for more details.
Dataset Splits Yes Hyper parameters such as learning rates were chosen based on the performance on the development data, which was a held-out portion of the training data, and training was redone using all the training data with the chosen parameters.
Hardware Specification Yes Time : seconds per epoch for training on Tesla M2070.
Software Dependencies No The paper mentions various algorithms and tools (e.g., SGD, rmsprop, dropout, word2vec) but does not provide specific version numbers for any software libraries or frameworks used in the experiments.
Experiment Setup Yes square loss was minimized with dropout [...] applied to the input to the top layer; weights were initialized by the Gaussian distribution with zero mean and standard deviation 0.01. Optimization was done with SGD with mini-batch size 50 or 100 with momentum or optionally rmsprop [...] for acceleration. [...] pooling is done for k non-overlapping regions of equal size [...]; max-pooling with k=1 on IMDB and Elec and average-pooling with k=10 on RCV1; on 20NG, max-pooling with k=10 was chosen. [...] We trained two LSTMs (forward and backward) with 100 units each on unlabeled data. The training objective was to predict the next k words where k was set to 20 for RCV1 and 5 for others.