Towards Robust Dense Retrieval via Local Ranking Alignment

Authors: Xuanang Chen, Jian Luo, Ben He, Le Sun, Yingfei Sun

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on MS MARCO and ANTIQUE datasets show that Ro DR significantly improves the retrieval results on both the original queries and different types of query variations.
Researcher Affiliation Academia Xuanang Chen1,2 , Jian Luo1,2 , Ben He1,2 , Le Sun2 and Yingfei Sun1 1University of Chinese Academy of Sciences, Beijing, China 2Institute of Software, Chinese Academy of Sciences, Beijing, China
Pseudocode No The paper describes the method using mathematical equations and text but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Our code and models are openly available at https://github.com/cxa-unique/Ro DR.
Open Datasets Yes We employ both MS MARCO passage and document datasets for our experiments. The MS MARCO passage (document) corpus contains 8.8 (3.2) million passages (documents), from which we construct about 0.40 (0.37) million training samples. ... The human validated query variation data is based on the 200 original queries from ANTIQUE dataset [Hashemi et al., 2020] (namely, the antique/train/split200-valid set available in ir datasets [Mac Avaney et al., 2021])
Dataset Splits Yes We use 6,980 (5,193) Dev queries in MS MARCO passage (document) dataset for evaluation, along with the official metric, namely, MRR@10 (MRR@100). ... The max query length is 32, and the max passage (document) length is 128 (512).
Hardware Specification Yes For MS MARCO passage (document) retrieval, the DR model is trained with the learning rate of 5e-6 and per-device batch size of 16 (2) for 4 epochs on one (four) GeForce RTX 3090 GPU(s).
Software Dependencies No Our model training is based on the Tevatron toolkit [Gao et al., 2022], with parameter-shared BERTBase model as the query and passage (document) encoders. The paper does not specify version numbers for these or other software components.
Experiment Setup Yes The max query length is 32, and the max passage (document) length is 128 (512). For MS MARCO passage (document) retrieval, the DR model is trained with the learning rate of 5e-6 and per-device batch size of 16 (2) for 4 epochs on one (four) GeForce RTX 3090 GPU(s). ... The negative passages (documents) in DO are sampled from the provided official triples (the top-1k BM25 candidates), and there are 7 negatives in each training sample, namely, n = 7 in DO and DN. The loss weights w1, w2 and w3 in Eq. 8 are set as 1, 1, 0.2 (1, 0.1, 1) for MS MARCO passage (document) retrieval.