reproducibilityindex.ai

Transformer Memory as a Differentiable Search Index

Authors: Yi Tay, Vinh Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that given appropriate design choices, DSI signiﬁcantly outperforms strong baselines such as dual encoder models. Moreover, DSI demonstrates strong generalization capabilities, outperforming a BM25 baseline in a zero-shot setup. In this section, we discuss our experimental setup, datasets used and baselines compared. We also discuss experimental results, ﬁndings and effect of various strategies discussed in earlier sections of the paper.
Researcher Affiliation	Industry	Google Research {yitay,vqtran,metzler}@google.com
Pseudocode	Yes	Algorithm 1 Generating semantically structured identiﬁers. (Referenced in Section 3.2.)
Open Source Code	No	The paper states it uses the 'Jax/T5X implementation' and provides a GitHub link for T5X, which is an open-source framework. However, it does not explicitly state that the specific DSI code developed for this paper's methodology is open-source or provide a link to it.
Open Datasets	Yes	We conduct our experiments on the challenging Natural Questions (NQ) (Kwiatkowski et al., 2019) dataset.
Dataset Splits	Yes	NQ consists of 307K query-document training pairs and 8K validation pairs, where the queries are natural language questions and the documents are Wikipedia articles. NQ320K is the full NQ set and uses its predetermined training and validation split for evaluation purposes. Unlike NQ320K, NQ10K and NQ100K constructs randomly sampled validation sets.
Hardware Specification	Yes	Our training hardware consists of 128-256 TPUv4 chips for models above 1B parameters and 64-128 TPUv3 or TPUv4 chips otherwise.
Software Dependencies	No	The paper mentions using 'the Jax/T5X implementation for our experiments' but does not specify version numbers for Jax, T5X, or any other software dependencies.
Experiment Setup	Yes	The DSI models are trained for a maximum of 1M steps using a batch size of 128. We pick the best checkpoint based on retrieval validation performance. We tune the learning rate amongst {0.001, 0.0005} and linear warmup amongst {10K, 100K, 200K, 300K} and/or none.