Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
In defense of dual-encoders for neural ranking
Authors: Aditya Menon, Sadeep Jayasumana, Ankit Singh Rawat, Seungyeon Kim, Sashank Reddi, Sanjiv Kumar
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | First, we establish theoretically that with a sufficiently large encoder size, DE models can capture a broad class of scores without cross-attention. Second, we show that on real-world problems, the gap between CA and DE models may be due to the latter overfitting to the training set. To mitigate this, we propose a distillation strategy that focuses on preserving the ordering amongst documents, and confirm its efficacy on neural re-ranking benchmarks. |
| Researcher Affiliation | Industry | 1Google Research, New York, USA. |
| Pseudocode | No | The paper does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We present results on MSMARCO-Passage (Nguyen et al., 2016) and Natural Questions (NQ) (Kwiatkowski et al., 2019). |
| Dataset Splits | Yes | We train a series of BERT-based CA and DE models on the ( small ) triplets training set, employing 6-layer BERT models... For each model, we compute the mean reciprocal rank (MRR)@10 (Radev et al., 2002) on the provided train and dev set. (We shall refer to the dev set as the test set for simplicity.) |
| Hardware Specification | No | The paper mentions using "BERT-based CA and DE models" and "6-layer BERT models" but does not specify any hardware details such as GPU/CPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper mentions using "transformer encoders initialised with the standard pre-trained BERT model checkpoints" but does not provide specific versions for any software components, libraries, or programming languages. |
| Experiment Setup | Yes | We optimise all methods for a maximum of 3 × 10^5 steps using Adam with weight decay, with a batch size of 128 and a learning rate of 2.8 × 10^-5 (i.e., a 4 scaling of the choices in Hofstätter et al. (2020a)). For all models, at the output layer we apply dropout at rate 0.1 and layer normalisation. We use a sequence length of 30 for queries, and 200 for passages. |