reproducibilityindex.ai

Adversarial Retriever-Ranker for Dense Text Retrieval

Authors: Hang Zhang, Yeyun Gong, Yelong Shen, Jiancheng Lv, Nan Duan, Weizhu Chen

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate AR2 on three benchmarks. Experimental results show that AR2 consistently and signiﬁcantly outperforms existing dense retriever methods and achieves new state-of-the-art results on all of them.
Researcher Affiliation	Collaboration	1College of Computer Science, Sichuan University, 2Microsoft Research Asia, 3Microsoft Azure AI hangzhang scu@foxmail.com,{yegong,yelong.shen}@microsoft.com, lvjiancheng@scu.edu.cn, {nanduan,wzchen}@microsoft.com
Pseudocode	Yes	Algorithm 1 Adversarial Retriever-Ranker (AR2)
Open Source Code	Yes	Code and models are available at https://github.com/microsoft/AR2.
Open Datasets	Yes	We conduct experiments on three popular benchmarks: Natural Questions (Kwiatkowski et al., 2019), Trivia QA (Joshi et al., 2017), and MS-MARCO Passage Ranking (Nguyen et al., 2016).
Dataset Splits	No	The paper mentions using a "dev set" for MS-MARCO and test sets for NQ and Trivia QA, but it does not specify explicit training/validation/test split percentages, sample counts for each split, or detailed methodology for generating these splits across all datasets. While hyperparameters are provided in Appendix A.3, detailed dataset splitting information for reproducibility is not explicitly stated.
Hardware Specification	Yes	All the experiments in this work run on 8 NVIDIA Tesla A100 GPUs.
Software Dependencies	No	The implementation code of AR2 is based on Huggingface Transformers (Wolf et al., 2020) utilizing gradient checkpointing (Chen et al., 2016), Apex1, and gradient accumulation to reduce GPU memory consumption. While it names Huggingface Transformers and Apex, it does not specify version numbers for these libraries or other critical software dependencies like Python or PyTorch.
Experiment Setup	Yes	The number of training iterations is set to 10. During each iteration of training, the retriever model is scheduled to train with 1500 minibatches, while the ranker model is scheduled to train with 500 mini-batches. The document index is refreshed after each iteration of training. The other hyper-parameters are shown in Appendix A.3.