Character n-Gram Embeddings to Improve RNN Language Models

Authors: Sho Takase, Jun Suzuki, Masaaki Nagata5074-5082

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments indicate that the proposed method outperforms neural language models trained with well-tuned hyperparameters and achieves state-of-the-art scores on each dataset. In addition, we incorporate our proposed method into a standard neural encoder-decoder model and investigate its effect on machine translation and headline generation. We indicate that the proposed method also has a positive effect on such tasks.
Researcher Affiliation Collaboration NTT Communication Science Laboratories Tohoku University sho.takase@nlp.c.titech.ac.jp, jun.suzuki@ecei.tohoku.ac.jp, nagata.masaaki@lab.ntt.co.jp Current affiliation: Tokyo Institute of Technology.
Pseudocode No The paper does not include any pseudocode or algorithm blocks.
Open Source Code No The paper mentions external implementations (e.g., https://github.com/salesforce/awd-lstm-lm) for baseline models but does not provide an explicit statement or link for the source code of their *own* proposed method or implementations.
Open Datasets Yes We used the standard benchmark datasets for the wordlevel language modeling: Penn Treebank (PTB) (Marcus, Marcinkiewicz, and Santorini 1993), Wiki Text-2 (WT2), and Wiki Text-103 (WT103) (Merity et al. 2017). (Mikolov et al. 2010) and (Merity et al. 2017) published pre-processed PTB3, WT2, and WT1034. Following the previous studies, we used these pre-processed datasets for our experiments. Table 1 describes the statistics of the datasets. 3http://www.fit.vutbr.cz/ mikolov/rnnlm/ 4https://einstein.ai/research/the-wikitext-long-termdependency-language-modeling-dataset
Dataset Splits Yes Table 1: Statistics of PTB, WT2, and WT103. Vocab, Train, Valid, Test counts are provided.
Hardware Specification Yes We calculated it on the NVIDIA Tesla P100.
Software Dependencies No The paper mentions using specific models like LSTM and QRNN, and refers to implementations of baseline models, but does not provide specific version numbers for any software dependencies or libraries (e.g., Python version, PyTorch/TensorFlow version, or specific library versions).
Experiment Setup Yes We set the embedding size and dimension of the LSTM hidden state to 500 for machine translation and 400 for headline generation. The mini-batch size is 64 for machine translation and 256 for headline generation. For other hyperparameters, we followed the configurations described in (Kiyono et al. 2017).