Learning to Rank in Generative Retrieval
Authors: Yongqi Li, Nan Yang, Liang Wang, Furu Wei, Wenjie Li
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments on three public benchmarks, and the results demonstrate that LTRGR achieves state-of-the-art performance among generative retrieval methods. The code and checkpoints are released at https://github.com/liyongqi67/LTRGR. [...] We evaluate our proposed method on three widely used datasets, and the results demonstrate that LTRGR achieves the best performance in generative retrieval. [...] We conducted experiments using the DPR (Karpukhin et al. 2020) setting on two widely-used open-domain QA datasets: NQ (Kwiatkowski et al. 2019) and Trivia QA (Joshi et al. 2017). Additionally, we evaluated generative retrieval methods on the MSMARCO dataset (Nguyen et al. 2016)... |
| Researcher Affiliation | Collaboration | Yongqi Li1, Nan Yang2, Liang Wang2, Furu Wei2, Wenjie Li1, 1The Hong Kong Polytechnic University 2Microsoft liyongqi0@gmail.com, {nanya,wangliang,fuwei}@microsoft.com, cswjli@comp.polyu.edu.hk |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and checkpoints are released at https://github.com/liyongqi67/LTRGR. [...] The code and checkpoints are released at https://github.com/liyongqi67/LTRGR. |
| Open Datasets | Yes | We conducted experiments using the DPR (Karpukhin et al. 2020) setting on two widely-used open-domain QA datasets: NQ (Kwiatkowski et al. 2019) and Trivia QA (Joshi et al. 2017). Additionally, we evaluated generative retrieval methods on the MSMARCO dataset (Nguyen et al. 2016). |
| Dataset Splits | No | For each query in the training set, we retrieved the top 200 passages and selected positive and negative passages from them. [...] We conducted experiments using the DPR (Karpukhin et al. 2020) setting on two widely-used open-domain QA datasets: NQ (Kwiatkowski et al. 2019) and Trivia QA (Joshi et al. 2017). Additionally, we evaluated generative retrieval methods on the MSMARCO dataset (Nguyen et al. 2016). While the paper mentions 'training set' and 'test set', it does not provide explicit percentages, counts, or a clear description of validation splits or how these splits were obtained (e.g., specific file names or citations to standard splits with details). |
| Hardware Specification | Yes | Our main experiments were conducted on a single NVIDIA A100 GPU with 80 GB of memory. [...] We conducted tests on LTRGR using a beam size of 15 on one V100 GPU with 32GB memory. |
| Software Dependencies | No | To ensure a fair comparison with previous work, we utilized BART-large as our backbone. [...] In the learning to rank phase, we used the Adam optimizer with a learning rate of 1e-5, trained with a batch size of 4, and conducted training for three epochs. The paper mentions software components like 'BART-large' and 'Adam optimizer' but does not specify their version numbers or the versions of any underlying frameworks (e.g., PyTorch, TensorFlow). |
| Experiment Setup | Yes | In the learning to rank phase, we used the Adam optimizer with a learning rate of 1e-5, trained with a batch size of 4, and conducted training for three epochs. For each query in the training set, we retrieved the top 200 passages and selected positive and negative passages from them. During training, we kept 40 predicted identifiers for each passage and removed any exceeding ones. The margin m and weight λ are set as 500 and 1000, respectively. |