reproducibilityindex.ai

Pre-training Tasks for Embedding-based Large-scale Retrieval

Authors: Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang, Yiming Yang, Sanjiv Kumar

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a comprehensive study on the embedding-based retrieval models. We show that the key ingredient of learning a strong embedding-based Transformer model is the set of pre-training tasks. With adequately designed paragraph-level pre-training tasks, the Transformer models can remarkably improve over the widely-used BM-25 as well as embedding models without Transformers. The paper includes a dedicated section '4 EXPERIMENTS' with detailed tables (Table 3, Table 4, etc.) presenting performance metrics and ablation studies on datasets like SQuAD and Natural Questions.
Researcher Affiliation	Collaboration	Wei-Cheng Chang , Felix X. Yu, Yin-Wen Chang, Yiming Yang, Sanjiv Kumar Carnegie Mellon University & Google {wchang2,yiming}@cs.cmu.edu, {felixyu,yinwen,sanjivk}@google.com
Pseudocode	No	The paper describes the proposed methods and tasks in detail but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement indicating that the source code for the methodology described is publicly released, nor does it provide a direct link to a code repository.
Open Datasets	Yes	The two QA datasets we consider are SQuAD and Natural Questions. Note that each entry of QA datasets is a tuple (q, a, p)...
Dataset Splits	Yes	For each dataset, we consider different training/test split of the data (1%/99%, 5%/95% and, 80%/20%) in the ﬁne-tuning stage and the 10% of training set is held out as the validation set for hyper-parameter tuning.
Hardware Specification	Yes	We pre-train the model on 32 TPU v3 chips for 100K steps with an Adam optimizer and batch size of 8192.
Software Dependencies	No	The paper mentions using 'Adam optimizer' and 'Transformer' models, but does not specify any software dependencies with version numbers (e.g., Python version, specific library versions like PyTorch or TensorFlow versions).
Experiment Setup	Yes	For both towers, the ﬁnal embedding is generated by applying a linear layer on the hidden state of the [CLS] token. The embedding dimension is 512. The sequence length for the query encoder and document encoder are set to be 64 and 288, respectively. We pre-train the model on 32 TPU v3 chips for 100K steps with an Adam optimizer and batch size of 8192. This process takes about 2.5 days. We use the Adam optimizer with an initial learning rate 1 × 10−4 with the warm-up ratio 0.1, followed by a linear learning rate decay. For ﬁne-tuning, the learning rate of Adam is set to 5 × 10−5 with 2000 training steps and batch size 512.