Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Authors: Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We fine-tune and evaluate our models on a wide range of knowledge-intensive NLP tasks and set the state of the art on three open domain QA tasks, outperforming parametric seq2seq models and task-specific retrieve-and-extract architectures.
Researcher Affiliation Collaboration Facebook AI Research; University College London; New York University; plewis@fb.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code to run experiments with RAG has been open-sourced as part of the Hugging Face Transformers Library [66] and can be found at https://github.com/huggingface/transformers/blob/master/ examples/rag/. An interactive demo of RAG models can be found at https://huggingface.co/rag/
Open Datasets Yes We consider four popular open-domain QA datasets: Natural Questions (NQ) [29], Trivia QA (TQA) [24]. Web Questions (WQ) [3] and Curated Trec (CT) [2]... We use the MSMARCO NLG task v2.1 [43]... We use the splits from Search QA [10]... FEVER [56]... We use a single Wikipedia dump for our non-parametric knowledge source. Following Lee et al. [31] and Karpukhin et al. [26], we use the December 2018 dump.
Dataset Splits Yes We consider k ∈ {5, 10} for training and set k for test time using dev data.
Hardware Specification No The paper discusses the models and datasets used but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) on which the experiments were run.
Software Dependencies No The paper mentions software components like Hugging Face Transformers Library and FAISS but does not specify version numbers for these or other software dependencies.
Experiment Setup Yes Given a fine-tuning training corpus of input/output pairs (xj, yj), we minimize the negative marginal log-likelihood of each target, Pj log p(yj|xj) using stochastic gradient descent with Adam [28]... We consider k ∈ {5, 10} for training and set k for test time using dev data.