Rethinking the Role of Token Retrieval in Multi-Vector Retrieval
Authors: Jinhyuk Lee, Zhuyun Dai, Sai Meher Karthik Duddu, Tao Lei, Iftekhar Naim, Ming-Wei Chang, Vincent Zhao
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the popular BEIR benchmark, XTR advances the state-of-the-art by 2.8 nDCG@10 without any distillation. Detailed analysis confirms our decision to revisit the token retrieval stage, as XTR demonstrates much better recall of the token retrieval stage compared to Col BERT. and 4 Experiments section. |
| Researcher Affiliation | Industry | Jinhyuk Lee Zhuyun Dai Sai Meher Karthik Duddu Tao Lei Iftekhar Naim Ming-Wei Chang Vincent Y. Zhao Google Deep Mind |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | For the zero-shot evaluation, we use 13 datasets from BEIR [Thakur et al., 2021] (see Appendix C for acronyms), 12 datasets from LoTTE [Santhanam et al., 2022b], and 4 datasets on open-domain QA passage retrieval (EQ: Entity Questions [Sciavolino et al., 2021], NQ, TQA: Trivia QA, SQD: SQu AD). We also train multilingual XTR (mXTR) and evaluate it on MIRACL [Zhang et al., 2022b], which contains retrieval tasks in 18 languages. |
| Dataset Splits | Yes | We fine-tune XTR on MS MARCO with a fixed set of hard negatives from Rocket QA [Qu et al., 2021]. Then, we test XTR on MS MARCO (MS; in-domain) and zero-shot IR datasets. and We report MRR@10 and Recall@1000 on the MS MARCO development set. |
| Hardware Specification | Yes | Up to 256 chips of TPU v3 accelerator were used depending on the size of the model. |
| Software Dependencies | No | The paper mentions using ScaNN for MIPS, and initializing from T5/mT5 models, but does not provide specific version numbers for these or other software dependencies like Python, PyTorch/TensorFlow, or CUDA. |
| Experiment Setup | Yes | We fine-tune XTR for 50,000 iterations with the learning rate to 1e-3. and In our experiments, we tried ktrain = {32, 64, 128, 256, 320} for each batch size and choose the best model based on their performance on the MS MARCO development set. and For inference, XTR uses k for the token retrieval. We use k = 40,000 |