Rethinking the Role of Token Retrieval in Multi-Vector Retrieval

Authors: Jinhyuk Lee, Zhuyun Dai, Sai Meher Karthik Duddu, Tao Lei, Iftekhar Naim, Ming-Wei Chang, Vincent Zhao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On the popular BEIR benchmark, XTR advances the state-of-the-art by 2.8 nDCG@10 without any distillation. Detailed analysis confirms our decision to revisit the token retrieval stage, as XTR demonstrates much better recall of the token retrieval stage compared to Col BERT. and 4 Experiments section.
Researcher Affiliation Industry Jinhyuk Lee Zhuyun Dai Sai Meher Karthik Duddu Tao Lei Iftekhar Naim Ming-Wei Chang Vincent Y. Zhao Google Deep Mind
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets Yes For the zero-shot evaluation, we use 13 datasets from BEIR [Thakur et al., 2021] (see Appendix C for acronyms), 12 datasets from LoTTE [Santhanam et al., 2022b], and 4 datasets on open-domain QA passage retrieval (EQ: Entity Questions [Sciavolino et al., 2021], NQ, TQA: Trivia QA, SQD: SQu AD). We also train multilingual XTR (mXTR) and evaluate it on MIRACL [Zhang et al., 2022b], which contains retrieval tasks in 18 languages.
Dataset Splits Yes We fine-tune XTR on MS MARCO with a fixed set of hard negatives from Rocket QA [Qu et al., 2021]. Then, we test XTR on MS MARCO (MS; in-domain) and zero-shot IR datasets. and We report MRR@10 and Recall@1000 on the MS MARCO development set.
Hardware Specification Yes Up to 256 chips of TPU v3 accelerator were used depending on the size of the model.
Software Dependencies No The paper mentions using ScaNN for MIPS, and initializing from T5/mT5 models, but does not provide specific version numbers for these or other software dependencies like Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup Yes We fine-tune XTR for 50,000 iterations with the learning rate to 1e-3. and In our experiments, we tried ktrain = {32, 64, 128, 256, 320} for each batch size and choose the best model based on their performance on the MS MARCO development set. and For inference, XTR uses k for the token retrieval. We use k = 40,000