reproducibilityindex.ai

Large Memory Layers with Product Keys

Authors: Guillaume Lample, Alexandre Sablayrolles, Marc'Aurelio Ranzato, Ludovic Denoyer, Herve Jegou

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We report results on large-scale experiments for transformer models equipped with a memory, followed by an ablation study that shows the impact of different memory components on the model performance and memory usage.
Researcher Affiliation	Collaboration	Facebook AI Research Sorbonne Universit es, UPMC Univ Paris 06, UMR 7606, LIP6
Pseudocode	No	The paper describes the memory design and key selection process in text and figures, but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	We release our code for reproducibility purposes.3 https://github.com/facebookresearch/XLM
Open Datasets	Yes	We therefore evaluate the beneﬁt of our approach on a corpus that is 30 times larger and extracted from the public Common Crawl. The training set is composed of 28 billion words (140 GB of data) extracted from about 40 million English news articles indexed by Common Crawl corpora.
Dataset Splits	Yes	The validation and test sets are both composed of 5000 news articles removed from the training set.
Hardware Specification	Yes	We implement our models with Py Torch [35], and train them on 32 Volta GPUs.
Software Dependencies	No	The paper mentions software like PyTorch, fast BPE, and Moses toolkit, but does not specify their version numbers (e.g., 'We implement our models with Py Torch [35]').
Experiment Setup	Yes	We train our models with the Adam optimizer [25], with a learning rate of 2.5 10 4, with β1 = 0.9, β2 = 0.98, following the learning rate schedule of Vaswani et al. [44]. ... In our main experiments, we use H = 4 memory heads, we select k = 32 keys per head, and use \|K\| = 5122 memory slots.