Transformer Memory as a Differentiable Search Index

Authors: Yi Tay, Vinh Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that given appropriate design choices, DSI significantly outperforms strong baselines such as dual encoder models. Moreover, DSI demonstrates strong generalization capabilities, outperforming a BM25 baseline in a zero-shot setup. In this section, we discuss our experimental setup, datasets used and baselines compared. We also discuss experimental results, findings and effect of various strategies discussed in earlier sections of the paper.
Researcher Affiliation Industry Google Research {yitay,vqtran,metzler}@google.com
Pseudocode Yes Algorithm 1 Generating semantically structured identifiers. (Referenced in Section 3.2.)
Open Source Code No The paper states it uses the 'Jax/T5X implementation' and provides a GitHub link for T5X, which is an open-source framework. However, it does not explicitly state that the specific DSI code developed for this paper's methodology is open-source or provide a link to it.
Open Datasets Yes We conduct our experiments on the challenging Natural Questions (NQ) (Kwiatkowski et al., 2019) dataset.
Dataset Splits Yes NQ consists of 307K query-document training pairs and 8K validation pairs, where the queries are natural language questions and the documents are Wikipedia articles. NQ320K is the full NQ set and uses its predetermined training and validation split for evaluation purposes. Unlike NQ320K, NQ10K and NQ100K constructs randomly sampled validation sets.
Hardware Specification Yes Our training hardware consists of 128-256 TPUv4 chips for models above 1B parameters and 64-128 TPUv3 or TPUv4 chips otherwise.
Software Dependencies No The paper mentions using 'the Jax/T5X implementation for our experiments' but does not specify version numbers for Jax, T5X, or any other software dependencies.
Experiment Setup Yes The DSI models are trained for a maximum of 1M steps using a batch size of 128. We pick the best checkpoint based on retrieval validation performance. We tune the learning rate amongst {0.001, 0.0005} and linear warmup amongst {10K, 100K, 200K, 300K} and/or none.