Autoregressive Search Engines: Generating Substrings as Document Identifiers

Authors: Michele Bevilacqua, Giuseppe Ottaviano, Patrick Lewis, Scott Yih, Sebastian Riedel, Fabio Petroni

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show this not only outperforms prior autoregressive approaches but also leads to an average improvement of at least 10 points over more established retrieval solutions for passage-level retrieval on the KILT benchmark, establishing new stateof-the-art downstream performance on some datasets, while using a considerably lighter memory footprint than competing systems.
Researcher Affiliation Collaboration 1Sapienza University of Rome 2Meta AI 3University College London
Pseudocode No The paper describes its methods in prose and with mathematical formulas, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code and pre-trained models are available at https://github.com/facebookresearch/SEAL.
Open Datasets Yes Natural Questions (NQ) is dataset containing query-document pairs, where the query is a question... We experiment on both the customary retrieval setup used by, among others, Karpukhin et al. [2020] and Mao et al. [2021]... KILT is a comprehensive benchmark collecting different datasets including question answering, fact checking, dialogue, slot filling, and entity linking [Petroni et al., 2021].
Dataset Splits Yes We train Fi D on training set predictions... NQ320k is a much more restricted setting, in which the retrieval set is limited to the union of all ground truth document in the training, dev or test set. Different revisions of the same Wikipedia page count as different documents.
Hardware Specification No The paper mentions running experiments on 'our 1 GPU evaluation setup' but does not specify the model or type of GPU. Furthermore, in the checklist, the authors explicitly state 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No]'.
Software Dependencies No The paper mentions software like 'C++ FM-index implementation in sdsl-lite', 'fairseq library', and 'pyserini' but does not specify version numbers for these dependencies.
Experiment Setup Yes We finetune BART large [Lewis et al., 2019] to generate ngrams of length k = 10 from the ground truth document... where L is the Levenshtein distance and τ (= 1.5 in our experiments) is a temperature parameter, controlling the peakiness of the distribution... We report training and inference hyperameters in Appendix ( A).