Unbounded cache model for online language modeling with open vocabulary

Authors: Edouard Grave, Moustapha M. Cisse, Armand Joulin

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments showing that our approach significantly improves the perplexity of pre-trained language models on new distributions, and can scale efficiently to much larger contexts than previously proposed local cache models. ... In this section, we present evaluations of our unbounded cache model on different language modeling tasks. We first briefly describe our experimental setting and the datasets we used, before presenting the results.
Researcher Affiliation Industry Edouard Grave Facebook AI Research egrave@fb.com Moustapha Cisse Facebook AI Research moustaphacisse@fb.com Armand Joulin Facebook AI Research ajoulin@fb.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions using an existing open-source library ("We use the IVFPQ implementation from the FAISS open source library.1"), but does not state that the authors are releasing their own source code for the methodology described in the paper.
Open Datasets Yes News Crawl2... News Commentary... Common Crawl... Wiki Text3... The book Corpus... All these datasets are publicly available. ... 2http://www.statmt.org/wmt14/translation-task.html ... 3https://metamind.io/research/the-wikitext-long-term-dependency-language-modeling-dataset/ ... 4http://www.gutenberg.org/
Dataset Splits No The paper states, "Unless stated otherwise, we use 2 million tokens for training the static models and 10 million tokens for evaluation." However, it does not specify explicit training/validation/test dataset splits (e.g., percentages or exact counts for each split).
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions "FAISS open source library" and "europarl dataset tools" but does not provide specific version numbers for these or any other key software components, which is required for a reproducible description of ancillary software.
Experiment Setup Yes We train recurrent neural networks with 256 LSTM hidden units, using the Adagrad algorithm with a learning rate of 0.2 and 10 epochs. We compute the gradients using backpropagation through time (BPTT) over 20 timesteps. Because of the large vocabulary sizes, we use the adaptative softmax [21]. We use the IVFPQ implementation from the FAISS open source library.1 We use 4,096 centroids and 8 probes for the inverted file. Unless said otherwise, we query the 1,024 nearest neighbors.