Unbounded cache model for online language modeling with open vocabulary
Authors: Edouard Grave, Moustapha M. Cisse, Armand Joulin
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments showing that our approach significantly improves the perplexity of pre-trained language models on new distributions, and can scale efficiently to much larger contexts than previously proposed local cache models. ... In this section, we present evaluations of our unbounded cache model on different language modeling tasks. We first briefly describe our experimental setting and the datasets we used, before presenting the results. |
| Researcher Affiliation | Industry | Edouard Grave Facebook AI Research egrave@fb.com Moustapha Cisse Facebook AI Research moustaphacisse@fb.com Armand Joulin Facebook AI Research ajoulin@fb.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using an existing open-source library ("We use the IVFPQ implementation from the FAISS open source library.1"), but does not state that the authors are releasing their own source code for the methodology described in the paper. |
| Open Datasets | Yes | News Crawl2... News Commentary... Common Crawl... Wiki Text3... The book Corpus... All these datasets are publicly available. ... 2http://www.statmt.org/wmt14/translation-task.html ... 3https://metamind.io/research/the-wikitext-long-term-dependency-language-modeling-dataset/ ... 4http://www.gutenberg.org/ |
| Dataset Splits | No | The paper states, "Unless stated otherwise, we use 2 million tokens for training the static models and 10 million tokens for evaluation." However, it does not specify explicit training/validation/test dataset splits (e.g., percentages or exact counts for each split). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions "FAISS open source library" and "europarl dataset tools" but does not provide specific version numbers for these or any other key software components, which is required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | We train recurrent neural networks with 256 LSTM hidden units, using the Adagrad algorithm with a learning rate of 0.2 and 10 epochs. We compute the gradients using backpropagation through time (BPTT) over 20 timesteps. Because of the large vocabulary sizes, we use the adaptative softmax [21]. We use the IVFPQ implementation from the FAISS open source library.1 We use 4,096 centroids and 8 probes for the inverted file. Unless said otherwise, we query the 1,024 nearest neighbors. |