Adversarial Retriever-Ranker for Dense Text Retrieval

Authors: Hang Zhang, Yeyun Gong, Yelong Shen, Jiancheng Lv, Nan Duan, Weizhu Chen

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate AR2 on three benchmarks. Experimental results show that AR2 consistently and significantly outperforms existing dense retriever methods and achieves new state-of-the-art results on all of them.
Researcher Affiliation Collaboration 1College of Computer Science, Sichuan University, 2Microsoft Research Asia, 3Microsoft Azure AI hangzhang scu@foxmail.com,{yegong,yelong.shen}@microsoft.com, lvjiancheng@scu.edu.cn, {nanduan,wzchen}@microsoft.com
Pseudocode Yes Algorithm 1 Adversarial Retriever-Ranker (AR2)
Open Source Code Yes Code and models are available at https://github.com/microsoft/AR2.
Open Datasets Yes We conduct experiments on three popular benchmarks: Natural Questions (Kwiatkowski et al., 2019), Trivia QA (Joshi et al., 2017), and MS-MARCO Passage Ranking (Nguyen et al., 2016).
Dataset Splits No The paper mentions using a "dev set" for MS-MARCO and test sets for NQ and Trivia QA, but it does not specify explicit training/validation/test split percentages, sample counts for each split, or detailed methodology for generating these splits across all datasets. While hyperparameters are provided in Appendix A.3, detailed dataset splitting information for reproducibility is not explicitly stated.
Hardware Specification Yes All the experiments in this work run on 8 NVIDIA Tesla A100 GPUs.
Software Dependencies No The implementation code of AR2 is based on Huggingface Transformers (Wolf et al., 2020) utilizing gradient checkpointing (Chen et al., 2016), Apex1, and gradient accumulation to reduce GPU memory consumption. While it names Huggingface Transformers and Apex, it does not specify version numbers for these libraries or other critical software dependencies like Python or PyTorch.
Experiment Setup Yes The number of training iterations is set to 10. During each iteration of training, the retriever model is scheduled to train with 1500 minibatches, while the ranker model is scheduled to train with 500 mini-batches. The document index is refreshed after each iteration of training. The other hyper-parameters are shown in Appendix A.3.