Modeling Sequential Sentence Relation to Improve Cross-lingual Dense Retrieval

Authors: Shunyu Zhang, Yaobo Liang, MING GONG, Daxin Jiang, Nan Duan

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on four cross-lingual retrieval tasks show MSM significantly outperforms existing advanced pre-training models, demonstrating the effectiveness and stronger cross-lingual retrieval capabilities of our approach.
Researcher Affiliation Industry 1Microsoft Research Asia, 2Microsoft STC Asia
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any specific links to open-source code for the described methodology, nor does it explicitly state that code will be released.
Open Datasets Yes We evaluate our model with other counterparts on 4 popular datasets: Mr. Ty Di is for query-passage retrieval, XOR Retrieve is cross-lingual retrieval for open-domain QA, Mewsli-X and LARe QA are for language-agnostic retrieval. ... Following Wenzek et al. (2019), we collect a clean version of Common Crawl (CC) including a 2,500GB multi-lingual corpus covering 108 languages...
Dataset Splits Yes For Mr. Ty Di dataset, the original paper adopt the Natural Questions data (Kwiatkowski et al., 2019) for fine-tuning while later Zhang et al. (2022b) suggests fine-tuning on MS MARCO for better results... For Mewsli-X and LARe QA, we follow the settings in XTREME-R, where Mewsli-X on a predefined set of English-only mention-entity pairs and LARe QA on the English QA pairs from SQuAD v1.1 train set.
Hardware Specification Yes We conduct pre-training on 8 A100 GPUs for about 200k steps. ... All these experiments are conducted on 8 NVIDIA Tesla A100 GPUs. ... all the evaluations are conducted on NVIDIA Tesla V100 GPU.
Software Dependencies No The paper mentions using 'Adam optimizer' and 'linear warm-up' for training, but does not specify version numbers for any software, libraries, or programming languages used.
Experiment Setup Yes We use a learning rate of 4e-5 and Adam optimizer with a linear warm-up. ... we limit the length of each sentence to 64 words (longer parts are truncated) and split documents with more than 32 sentences into smaller with each containing at most 32 sentences. ... For the fine-tuning stage... Mr. Ty Di... with a learning rate of 2e-5. The model is trained for up to 3 epochs with a mini-batch size of 64. When fine-tuning on the NQ dataset, it is up to 40 epochs with a mini-batch size of 128. When further fine-tuning on Mr. Ty Di’s in-language data, it is 40 epochs, mini-batch size 128, and 1e-5 learning rate.