Combining Multiple Supervision for Robust Zero-Shot Dense Retrieval

Authors: Yan Fang, Qingyao Ai, Jingtao Zhan, Yiqun Liu, Xiaolong Wu, Zhao Cao

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on zero-shot DR benchmarks show that RMSC significantly improves the ranking performance on the target domain compared to strong DR baselines and domain adaptation methods, while being stable during training and can be combined with query generation or second-stage pre-training.
Researcher Affiliation Collaboration Yan Fang1, Qingyao Ai1*, Jingtao Zhan1, Yiqun Liu1, Xiaolong Wu2, Zhao Cao2 1Quan Cheng Laboratory & DCST, Tsinghua University & Zhongguancun Laboratory & Beijing, China 2Huawei Poisson Lab fangy21@mails.tsinghua.edu.cn, aiqy@tsinghua.edu.cn
Pseudocode No The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm', nor are there structured steps formatted like code blocks.
Open Source Code No The paper does not contain any explicit statement or link indicating that the source code for the proposed methodology is publicly available.
Open Datasets Yes Following recent zero-shot DR research, we use the MS MARCO Passage Ranking dataset (Nguyen et al. 2016) as the source domain, with a corpus of 8.8M passages from web pages and 0.5M training queries. Each training query is coupled with a manually labeled positive passage, which together constitute the source supervised signals. As for the out-of-domain test sets, we select the BEIR dataset (Thakur et al. 2021) and the Lotte benchmark (Santhanam et al. 2021).
Dataset Splits No The paper mentions using MS MARCO for training and BEIR/Lotte for testing but does not provide specific train/validation/test split percentages, sample counts, or explicit references to predefined validation splits.
Hardware Specification No The paper does not provide any specific details regarding the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using 'Py Torch Framework and Hugging Face Library' and 'NLTK library' but does not specify their version numbers, which is required for reproducible software dependencies.
Experiment Setup Yes When implementing RMSC, the number of special tokens k is set to 1, and each special token is randomly initialized. For weak supervision extraction, we use the NLTK library to divide a document into sentences. The ICT method is used for the Lotte dataset and SCL for the BEIR dataset. We use Adam W optimizer during training. The batch size is set to 512 and the learning rate is set to 1e-5. As for the training settings, we adopt random negative sampling and select one negative document for each query.