Compressed Nonparametric Language Modelling

Authors: Ehsan Shareghi, Gholamreza Haffari, Trevor Cohn

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results illustrate that our model can be built on significantly larger datasets compared to previous HPYP models, while being several orders of magnitudes smaller, fast for training and inference, and outperforming the perplexity of the state-of-the-art Modified Kneser-Ney count-based LM smoothing by up to 15%.
Researcher Affiliation Academia Faculty of Information Technology, Monash University Computing and Information Systems, The University of Melbourne first.last@{monash.edu, unimelb.edu.au}
Pseudocode Yes Algorithm 1 Gibbs Sampler for η γ+
Open Source Code No The paper does not provide concrete access to the source code for the methodology described in this paper. A link to a third-party baseline (SM) is provided, but not to the authors' own implementation.
Open Datasets Yes We report the perplexity of KN, MKN, SM, and our approach CN using the Finnish (FI), Spanish (ES), German (DE), English (EN), French (FR), portions of the Europarl v7 [Koehn, 2005] corpus, as well as 250Mi B, 500Mi B, 1,2,4, and 8Gi B chunks of English Common Crawl corpus [Buck et al., 2014].
Dataset Splits No The paper uses "newstest-2014" and "newstest-2013" as test sets, and provides train and test set sizes in Table 4, but it does not specify explicit validation dataset splits (e.g., percentages or counts for a validation set).
Hardware Specification Yes All experiments are done on a single core on Intel Xeon E5-2667 3.2GHz and 180Gi B of RAM.
Software Dependencies No The paper mentions using the SRILM toolkit for Kneser-Ney and Modified Kneser-Ney perplexity measurements, but it does not specify a version number for SRILM or any other software dependencies with version numbers.
Experiment Setup Yes In our model, the discount parameters are set to Kneser-Ney discounts and tied based on the context size |u|, while each distribution uses its own separate concentration parameter. ... Instead, we follow a non-uniform sampling by shrinking the range to 1 tu w min{M, nu w} (Here M = 10).