reproducibilityindex.ai

Unbounded cache model for online language modeling with open vocabulary

Authors: Edouard Grave, Moustapha M. Cisse, Armand Joulin

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments showing that our approach signiﬁcantly improves the perplexity of pre-trained language models on new distributions, and can scale efﬁciently to much larger contexts than previously proposed local cache models. ... In this section, we present evaluations of our unbounded cache model on different language modeling tasks. We ﬁrst brieﬂy describe our experimental setting and the datasets we used, before presenting the results.
Researcher Affiliation	Industry	Edouard Grave Facebook AI Research egrave@fb.com Moustapha Cisse Facebook AI Research moustaphacisse@fb.com Armand Joulin Facebook AI Research ajoulin@fb.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using an existing open-source library ("We use the IVFPQ implementation from the FAISS open source library.1"), but does not state that the authors are releasing their own source code for the methodology described in the paper.
Open Datasets	Yes	News Crawl2... News Commentary... Common Crawl... Wiki Text3... The book Corpus... All these datasets are publicly available. ... 2http://www.statmt.org/wmt14/translation-task.html ... 3https://metamind.io/research/the-wikitext-long-term-dependency-language-modeling-dataset/ ... 4http://www.gutenberg.org/
Dataset Splits	No	The paper states, "Unless stated otherwise, we use 2 million tokens for training the static models and 10 million tokens for evaluation." However, it does not specify explicit training/validation/test dataset splits (e.g., percentages or exact counts for each split).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions "FAISS open source library" and "europarl dataset tools" but does not provide specific version numbers for these or any other key software components, which is required for a reproducible description of ancillary software.
Experiment Setup	Yes	We train recurrent neural networks with 256 LSTM hidden units, using the Adagrad algorithm with a learning rate of 0.2 and 10 epochs. We compute the gradients using backpropagation through time (BPTT) over 20 timesteps. Because of the large vocabulary sizes, we use the adaptative softmax [21]. We use the IVFPQ implementation from the FAISS open source library.1 We use 4,096 centroids and 8 probes for the inverted ﬁle. Unless said otherwise, we query the 1,024 nearest neighbors.