reproducibilityindex.ai

A Unified Pretraining Framework for Passage Ranking and Expansion

Authors: Ming Yan, Chenliang Li, Bin Bi, Wei Wang, Songfang Huang4555-4563

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	An extensive set of experiments have been conducted on two large-scale passage retrieval datasets to demonstrate the state-of-the-art results of the proposed framework in both the ﬁrst-stage retrieval and the ﬁnal re-ranking.
Researcher Affiliation	Industry	Ming Yan, Chenliang Li, Bin Bi, Wei Wang, Songfang Huang Alibaba Group {ym119608, lcl193798, b.bi, hebian.ww, songfang.hsf}@alibaba-inc.com
Pseudocode	No	The paper describes the model architecture and procedures in detail and provides diagrams, but it does not include explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository.
Open Datasets	Yes	Datasets MS MARCO Passage Retrieval 2 is one of the largest passage ranking datasets with about 8.8M passages obtained from the top-10 results retrieved by the Bing search engine from about 1M real user queries. The training set contains about 40M tuples of a query, relevant and non-relevant passages. There are about 500K distinct query-relevant passage pairs in the training set, where each query has one relevant passage on average. The development and test sets contain approximately 6,900 queries each, but relevance labels are made public only for the development set. 2https://github.com/microsoft/MSMARCO-Passage-Ranking. TREC 2019 Deep Learning Track 3 also uses a large human-generated set of training labels, from the MS MARCO dataset. Different from MS MARCO dataset, it uses a different hold-out test set and use relevance judges to evaluate the quality of passage rankings. It has 200 test queries, where the passages are labelled by NIST assessors using multi-graded judgments, allowing to measure NDCG. 3https://microsoft.github.io/TREC-2019-Deep-Learning/
Dataset Splits	Yes	The development and test sets contain approximately 6,900 queries each, but relevance labels are made public only for the development set. ... For each passage, we choose the top-20 generated queries for passage expansion due to the best overall MRR@10 on MS MARCO development set.
Hardware Specification	Yes	We use the Nvidia T40 GPU for serving.
Software Dependencies	No	The paper mentions using a Transformer encoder-decoder architecture and concepts from BERT and GPT but does not specify software dependencies with version numbers (e.g., Python, TensorFlow/PyTorch versions, CUDA).
Experiment Setup	Yes	We train it with batch size of 256 and maximum sequence length of 512 for 40 epochs. For the decoder pre-training, we also use the same optimizer and pretraining datasets as in BERT. We use multiple consecutive sentences up to 400 tokens as the source text input to the encoder, and use the subsequent sentence as target text to the decoder. We train it with batch size of 256 and maximum sequence length of 512 for 2 epochs. To better adapt the model to the target corpus, we also continue pre-training the UED on MS MARCO document corpus (0.5B words) with two-stage pretraining protocol for another 100K+100K steps with learning rate of 1e-5. ... We set the size of mini-batch to 24 and learning rate to 5e-6. For the retriever, we truncate the maximum input length to 384, and limit the length of query to 30 tokens when decoding. ... We ﬁne-tune with maximum sequence length of 384 for 500K steps and checkpoint each model at the 50K steps.