Character n-Gram Embeddings to Improve RNN Language Models
Authors: Sho Takase, Jun Suzuki, Masaaki Nagata5074-5082
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments indicate that the proposed method outperforms neural language models trained with well-tuned hyperparameters and achieves state-of-the-art scores on each dataset. In addition, we incorporate our proposed method into a standard neural encoder-decoder model and investigate its effect on machine translation and headline generation. We indicate that the proposed method also has a positive effect on such tasks. |
| Researcher Affiliation | Collaboration | NTT Communication Science Laboratories Tohoku University sho.takase@nlp.c.titech.ac.jp, jun.suzuki@ecei.tohoku.ac.jp, nagata.masaaki@lab.ntt.co.jp Current affiliation: Tokyo Institute of Technology. |
| Pseudocode | No | The paper does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions external implementations (e.g., https://github.com/salesforce/awd-lstm-lm) for baseline models but does not provide an explicit statement or link for the source code of their *own* proposed method or implementations. |
| Open Datasets | Yes | We used the standard benchmark datasets for the wordlevel language modeling: Penn Treebank (PTB) (Marcus, Marcinkiewicz, and Santorini 1993), Wiki Text-2 (WT2), and Wiki Text-103 (WT103) (Merity et al. 2017). (Mikolov et al. 2010) and (Merity et al. 2017) published pre-processed PTB3, WT2, and WT1034. Following the previous studies, we used these pre-processed datasets for our experiments. Table 1 describes the statistics of the datasets. 3http://www.fit.vutbr.cz/ mikolov/rnnlm/ 4https://einstein.ai/research/the-wikitext-long-termdependency-language-modeling-dataset |
| Dataset Splits | Yes | Table 1: Statistics of PTB, WT2, and WT103. Vocab, Train, Valid, Test counts are provided. |
| Hardware Specification | Yes | We calculated it on the NVIDIA Tesla P100. |
| Software Dependencies | No | The paper mentions using specific models like LSTM and QRNN, and refers to implementations of baseline models, but does not provide specific version numbers for any software dependencies or libraries (e.g., Python version, PyTorch/TensorFlow version, or specific library versions). |
| Experiment Setup | Yes | We set the embedding size and dimension of the LSTM hidden state to 500 for machine translation and 400 for headline generation. The mini-batch size is 64 for machine translation and 256 for headline generation. For other hyperparameters, we followed the configurations described in (Kiyono et al. 2017). |