Modeling Sequential Sentence Relation to Improve Cross-lingual Dense Retrieval
Authors: Shunyu Zhang, Yaobo Liang, MING GONG, Daxin Jiang, Nan Duan
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on four cross-lingual retrieval tasks show MSM significantly outperforms existing advanced pre-training models, demonstrating the effectiveness and stronger cross-lingual retrieval capabilities of our approach. |
| Researcher Affiliation | Industry | 1Microsoft Research Asia, 2Microsoft STC Asia |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any specific links to open-source code for the described methodology, nor does it explicitly state that code will be released. |
| Open Datasets | Yes | We evaluate our model with other counterparts on 4 popular datasets: Mr. Ty Di is for query-passage retrieval, XOR Retrieve is cross-lingual retrieval for open-domain QA, Mewsli-X and LARe QA are for language-agnostic retrieval. ... Following Wenzek et al. (2019), we collect a clean version of Common Crawl (CC) including a 2,500GB multi-lingual corpus covering 108 languages... |
| Dataset Splits | Yes | For Mr. Ty Di dataset, the original paper adopt the Natural Questions data (Kwiatkowski et al., 2019) for fine-tuning while later Zhang et al. (2022b) suggests fine-tuning on MS MARCO for better results... For Mewsli-X and LARe QA, we follow the settings in XTREME-R, where Mewsli-X on a predefined set of English-only mention-entity pairs and LARe QA on the English QA pairs from SQuAD v1.1 train set. |
| Hardware Specification | Yes | We conduct pre-training on 8 A100 GPUs for about 200k steps. ... All these experiments are conducted on 8 NVIDIA Tesla A100 GPUs. ... all the evaluations are conducted on NVIDIA Tesla V100 GPU. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'linear warm-up' for training, but does not specify version numbers for any software, libraries, or programming languages used. |
| Experiment Setup | Yes | We use a learning rate of 4e-5 and Adam optimizer with a linear warm-up. ... we limit the length of each sentence to 64 words (longer parts are truncated) and split documents with more than 32 sentences into smaller with each containing at most 32 sentences. ... For the fine-tuning stage... Mr. Ty Di... with a learning rate of 2e-5. The model is trained for up to 3 epochs with a mini-batch size of 64. When fine-tuning on the NQ dataset, it is up to 40 epochs with a mini-batch size of 128. When further fine-tuning on Mr. Ty Di’s in-language data, it is 40 epochs, mini-batch size 128, and 1e-5 learning rate. |