reproducibilityindex.ai

LexMAE: Lexicon-Bottlenecked Pretraining for Large-Scale Retrieval

Authors: Tao Shen, Xiubo Geng, Chongyang Tao, Can Xu, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On the ad-hoc retrieval benchmark, MS-Marco, it achieves 42.6% MRR@10 with 45.8 QPS for the passage dataset and 44.4% MRR@100 with 134.8 QPS for the document dataset, by a CPU machine. And Lex MAE shows state-of-the-art zero-shot transfer capability on BEIR benchmark with 12 datasets.
Researcher Affiliation	Industry	Tao Shen, Xiubo Geng, Chongyang Tao, Can Xu, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang Microsoft. {shentao,xigeng,chotao,caxu,xiaolhu, binxjia,linjya,djiang}@microsoft.com
Pseudocode	No	The paper contains an illustration (Figure 1) and mathematical formulas (e.g., Equation 16), but no clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	1We released our codes and models at https://github.com/taoshen58/Lex MAE.
Open Datasets	Yes	Following Formal et al. (2021a), we first employ the widely-used passage retrieval datasets, MS-Marco (Nguyen et al., 2016)... Besides, we evaluate the zero-shot transferability of our model on BEIR benchmark (Thakur et al., 2021).
Dataset Splits	Yes	We pre-train on the MS-Marco collection (Nguyen et al., 2016)... We report MRR@10 (M@10) and Recall@1/50/100/1K for MS-Marco Dev (passage)... In the first stage, we sample negatives for each query q within top K1 document candidates by BM25 retrieval system... Then, we sample the hard negatives N(hn1) for each query q within top-K2 candidates based on the relevance scores... Lastly, we further sample hard negatives N(hn2) for each query q within top-K3 candidates by the 2nd-stage retriever.
Hardware Specification	Yes	the pre-training is completed on 8 A100 GPUs within 14h. In contrast to (Wang et al., 2022) using 4 GPUs for fine-tuning, we limited all the fine-tuning experiments on one A100 GPU.
Software Dependencies	No	The paper mentions initialization by BERTbase (Devlin et al., 2019) and usage of Anserini (Yang et al., 2017), but does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup	Yes	the batch size is 2048, the max length is 144, the learning rate is 3 10 4, the number of training steps is 80k, the masking percentage (α%) of encoder is 30%, and that (α + β%) of decoder is 50%. Meantime, the random seed is always 42... learning rate is set to 2 10 5 by following Shen et al. (2022), the number of training epochs is set to 3... The batch size (w.r.t the number of queries) is set to 24 with 1 positive and 15 negative documents... λ1 = 0.002, λ2 = 0.008, λ3 = 0.008.