TriSampler: A Better Negative Sampling Principle for Dense Retrieval
Authors: Zhen Yang, Zhou Shao, Yuxiao Dong, Jie Tang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental evaluation show that Tri Sampler consistently attains superior retrieval performance across a diverse of representative retrieval models. ... Experiments |
| Researcher Affiliation | Academia | Department of Computer Science and Technology, Tsinghua University, Beijing, China |
| Pseudocode | Yes | Algorithm 1: Algorithm of Tri Sampler |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | We conduct experiments on the first retrieval stage of four benchmarks: three passage retrieval datasets: MS MARCO passage (MS Pas) (Nguyen et al. 2016), Natural Questions (NQ) (Kwiatkowski et al. 2019), and Trivia QA (TQA) (Joshi et al. 2017), and a document retrieval dataset: MS MARCO document (MS Doc) (Nguyen et al. 2016). |
| Dataset Splits | Yes | Datasets Training Dev Test Documents NQ 58,880 8,757 3,610 21,015,324 TQA 60,413 8,837 11,313 21,015,324 MS Pas 502,939 6,980 8,841,823 MS Doc 367,013 5,193 3,213,835 |
| Hardware Specification | Yes | We implement Tri Sampler based on SOTA dense retrieval model AR2 (Zhang et al. 2021) and run all experiments on 8 NVIDIA Tesla A100 GPUs. |
| Software Dependencies | No | The paper mentions 'ERNIE-2.0-base' and 'Faiss' but does not specify their version numbers or any other software dependencies with version information. |
| Experiment Setup | Yes | In our experiments, the ratio of positive to negative pairs is set to 1 : 15, the inner product is leveraged to estimate the relevance score and Faiss (Johnson, Douze, and J egou 2019) is adopted for efficient similarity search. We utilize the top-200 passages for NQ and TQA datasets and the top-400 documents for MS Pas and MS Doc datasets as negative candidates. |