Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval

Authors: Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul N. Bennett, Junaid Ahmed, Arnold Overwijk

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate the effectiveness of ANCE on web search, question answering, and in a commercial search engine, showing ANCE dot-product retrieval nearly matches the accuracy of BERT-based cascade IR pipeline. We also empirically validate our theory that negative sampling with ANCE better approximates the oracle importance sampling procedure and improves learning convergence.
Researcher Affiliation Industry Microsoft Corporation. lexion, chenyan.xiong, yeli1, kwokfung.tang, jialliu, paul.n.bennett, jahmed, arnold.overwijk@microsoft.com
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code and trained models are available at http://aka.ms/ance.
Open Datasets Yes The web search experiments use the TREC 2019 Deep Learning (DL) Track (Craswell et al., 2020). The Open QA experiments use the Natural Questions (NQ) (Kwiatkowski et al., 2019) and Trivia QA (TQA) (Joshi et al., 2017), following the exact settings from Karpukhin et al. (2020).
Dataset Splits Yes The training and development sets are from MS MARCO, which includes passage level relevance labels for one million Bing queries (Bajaj et al., 2016). The testing sets are labeled by NIST accessors on the top 10 ranked results from past Track participants (Craswell et al., 2020).
Hardware Specification Yes We use batch size 8 and gradient accumulation step 2 on 4 V100 32GB GPUs... We measured ANCE efficiency using one 32GB V100 GPU, Intel(R) Xeon(R) Platinum 8168 CPU and 650GB of RAM memory.
Software Dependencies No The paper mentions software like RoBERTa base, Faiss, and LAMB optimizer, but does not provide specific version numbers for these or other relevant software dependencies such as Python or PyTorch.
Experiment Setup Yes We use batch size 8 and gradient accumulation step 2 on 4 V100 32GB GPUs... For each positive, we uniformly sample one negative from ANN top 200... The Trainer produces a model checkpoint every 5k or 10k training batches... The optimization uses LAMB optimizer, learning rate 5e-6 for document and 1e-6 for passage retrieval, and linear warm-up and decay after 5000 steps.