LexMAE: Lexicon-Bottlenecked Pretraining for Large-Scale Retrieval
Authors: Tao Shen, Xiubo Geng, Chongyang Tao, Can Xu, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the ad-hoc retrieval benchmark, MS-Marco, it achieves 42.6% MRR@10 with 45.8 QPS for the passage dataset and 44.4% MRR@100 with 134.8 QPS for the document dataset, by a CPU machine. And Lex MAE shows state-of-the-art zero-shot transfer capability on BEIR benchmark with 12 datasets. |
| Researcher Affiliation | Industry | Tao Shen, Xiubo Geng, Chongyang Tao, Can Xu, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang Microsoft. {shentao,xigeng,chotao,caxu,xiaolhu, binxjia,linjya,djiang}@microsoft.com |
| Pseudocode | No | The paper contains an illustration (Figure 1) and mathematical formulas (e.g., Equation 16), but no clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | 1We released our codes and models at https://github.com/taoshen58/Lex MAE. |
| Open Datasets | Yes | Following Formal et al. (2021a), we first employ the widely-used passage retrieval datasets, MS-Marco (Nguyen et al., 2016)... Besides, we evaluate the zero-shot transferability of our model on BEIR benchmark (Thakur et al., 2021). |
| Dataset Splits | Yes | We pre-train on the MS-Marco collection (Nguyen et al., 2016)... We report MRR@10 (M@10) and Recall@1/50/100/1K for MS-Marco Dev (passage)... In the first stage, we sample negatives for each query q within top K1 document candidates by BM25 retrieval system... Then, we sample the hard negatives N(hn1) for each query q within top-K2 candidates based on the relevance scores... Lastly, we further sample hard negatives N(hn2) for each query q within top-K3 candidates by the 2nd-stage retriever. |
| Hardware Specification | Yes | the pre-training is completed on 8 A100 GPUs within 14h. In contrast to (Wang et al., 2022) using 4 GPUs for fine-tuning, we limited all the fine-tuning experiments on one A100 GPU. |
| Software Dependencies | No | The paper mentions initialization by BERTbase (Devlin et al., 2019) and usage of Anserini (Yang et al., 2017), but does not provide specific version numbers for software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | the batch size is 2048, the max length is 144, the learning rate is 3 10 4, the number of training steps is 80k, the masking percentage (α%) of encoder is 30%, and that (α + β%) of decoder is 50%. Meantime, the random seed is always 42... learning rate is set to 2 10 5 by following Shen et al. (2022), the number of training epochs is set to 3... The batch size (w.r.t the number of queries) is set to 24 with 1 positive and 15 negative documents... λ1 = 0.002, λ2 = 0.008, λ3 = 0.008. |