Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Authors: Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We fine-tune and evaluate our models on a wide range of knowledge-intensive NLP tasks and set the state of the art on three open domain QA tasks, outperforming parametric seq2seq models and task-specific retrieve-and-extract architectures. |
| Researcher Affiliation | Collaboration | Facebook AI Research; University College London; New York University; plewis@fb.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code to run experiments with RAG has been open-sourced as part of the Hugging Face Transformers Library [66] and can be found at https://github.com/huggingface/transformers/blob/master/ examples/rag/. An interactive demo of RAG models can be found at https://huggingface.co/rag/ |
| Open Datasets | Yes | We consider four popular open-domain QA datasets: Natural Questions (NQ) [29], Trivia QA (TQA) [24]. Web Questions (WQ) [3] and Curated Trec (CT) [2]... We use the MSMARCO NLG task v2.1 [43]... We use the splits from Search QA [10]... FEVER [56]... We use a single Wikipedia dump for our non-parametric knowledge source. Following Lee et al. [31] and Karpukhin et al. [26], we use the December 2018 dump. |
| Dataset Splits | Yes | We consider k ∈ {5, 10} for training and set k for test time using dev data. |
| Hardware Specification | No | The paper discusses the models and datasets used but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) on which the experiments were run. |
| Software Dependencies | No | The paper mentions software components like Hugging Face Transformers Library and FAISS but does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Given a fine-tuning training corpus of input/output pairs (xj, yj), we minimize the negative marginal log-likelihood of each target, Pj log p(yj|xj) using stochastic gradient descent with Adam [28]... We consider k ∈ {5, 10} for training and set k for test time using dev data. |