reproducibilityindex.ai

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Authors: Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We fine-tune and evaluate our models on a wide range of knowledge-intensive NLP tasks and set the state of the art on three open domain QA tasks, outperforming parametric seq2seq models and task-specific retrieve-and-extract architectures.
Researcher Affiliation	Collaboration	Facebook AI Research; University College London; New York University; plewis@fb.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code to run experiments with RAG has been open-sourced as part of the Hugging Face Transformers Library [66] and can be found at https://github.com/huggingface/transformers/blob/master/ examples/rag/. An interactive demo of RAG models can be found at https://huggingface.co/rag/
Open Datasets	Yes	We consider four popular open-domain QA datasets: Natural Questions (NQ) [29], Trivia QA (TQA) [24]. Web Questions (WQ) [3] and Curated Trec (CT) [2]... We use the MSMARCO NLG task v2.1 [43]... We use the splits from Search QA [10]... FEVER [56]... We use a single Wikipedia dump for our non-parametric knowledge source. Following Lee et al. [31] and Karpukhin et al. [26], we use the December 2018 dump.
Dataset Splits	Yes	We consider k ∈ {5, 10} for training and set k for test time using dev data.
Hardware Specification	No	The paper discusses the models and datasets used but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) on which the experiments were run.
Software Dependencies	No	The paper mentions software components like Hugging Face Transformers Library and FAISS but does not specify version numbers for these or other software dependencies.
Experiment Setup	Yes	Given a ﬁne-tuning training corpus of input/output pairs (xj, yj), we minimize the negative marginal log-likelihood of each target, Pj log p(yj\|xj) using stochastic gradient descent with Adam [28]... We consider k ∈ {5, 10} for training and set k for test time using dev data.