Adversarial Retriever-Ranker for Dense Text Retrieval
Authors: Hang Zhang, Yeyun Gong, Yelong Shen, Jiancheng Lv, Nan Duan, Weizhu Chen
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate AR2 on three benchmarks. Experimental results show that AR2 consistently and significantly outperforms existing dense retriever methods and achieves new state-of-the-art results on all of them. |
| Researcher Affiliation | Collaboration | 1College of Computer Science, Sichuan University, 2Microsoft Research Asia, 3Microsoft Azure AI hangzhang scu@foxmail.com,{yegong,yelong.shen}@microsoft.com, lvjiancheng@scu.edu.cn, {nanduan,wzchen}@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Adversarial Retriever-Ranker (AR2) |
| Open Source Code | Yes | Code and models are available at https://github.com/microsoft/AR2. |
| Open Datasets | Yes | We conduct experiments on three popular benchmarks: Natural Questions (Kwiatkowski et al., 2019), Trivia QA (Joshi et al., 2017), and MS-MARCO Passage Ranking (Nguyen et al., 2016). |
| Dataset Splits | No | The paper mentions using a "dev set" for MS-MARCO and test sets for NQ and Trivia QA, but it does not specify explicit training/validation/test split percentages, sample counts for each split, or detailed methodology for generating these splits across all datasets. While hyperparameters are provided in Appendix A.3, detailed dataset splitting information for reproducibility is not explicitly stated. |
| Hardware Specification | Yes | All the experiments in this work run on 8 NVIDIA Tesla A100 GPUs. |
| Software Dependencies | No | The implementation code of AR2 is based on Huggingface Transformers (Wolf et al., 2020) utilizing gradient checkpointing (Chen et al., 2016), Apex1, and gradient accumulation to reduce GPU memory consumption. While it names Huggingface Transformers and Apex, it does not specify version numbers for these libraries or other critical software dependencies like Python or PyTorch. |
| Experiment Setup | Yes | The number of training iterations is set to 10. During each iteration of training, the retriever model is scheduled to train with 1500 minibatches, while the ranker model is scheduled to train with 500 mini-batches. The document index is refreshed after each iteration of training. The other hyper-parameters are shown in Appendix A.3. |