reproducibilityindex.ai

Learning to Rank in Generative Retrieval

Authors: Yongqi Li, Nan Yang, Liang Wang, Furu Wei, Wenjie Li

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted experiments on three public benchmarks, and the results demonstrate that LTRGR achieves state-of-the-art performance among generative retrieval methods. The code and checkpoints are released at https://github.com/liyongqi67/LTRGR. [...] We evaluate our proposed method on three widely used datasets, and the results demonstrate that LTRGR achieves the best performance in generative retrieval. [...] We conducted experiments using the DPR (Karpukhin et al. 2020) setting on two widely-used open-domain QA datasets: NQ (Kwiatkowski et al. 2019) and Trivia QA (Joshi et al. 2017). Additionally, we evaluated generative retrieval methods on the MSMARCO dataset (Nguyen et al. 2016)...
Researcher Affiliation	Collaboration	Yongqi Li1, Nan Yang2, Liang Wang2, Furu Wei2, Wenjie Li1, 1The Hong Kong Polytechnic University 2Microsoft liyongqi0@gmail.com, {nanya,wangliang,fuwei}@microsoft.com, cswjli@comp.polyu.edu.hk
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code and checkpoints are released at https://github.com/liyongqi67/LTRGR. [...] The code and checkpoints are released at https://github.com/liyongqi67/LTRGR.
Open Datasets	Yes	We conducted experiments using the DPR (Karpukhin et al. 2020) setting on two widely-used open-domain QA datasets: NQ (Kwiatkowski et al. 2019) and Trivia QA (Joshi et al. 2017). Additionally, we evaluated generative retrieval methods on the MSMARCO dataset (Nguyen et al. 2016).
Dataset Splits	No	For each query in the training set, we retrieved the top 200 passages and selected positive and negative passages from them. [...] We conducted experiments using the DPR (Karpukhin et al. 2020) setting on two widely-used open-domain QA datasets: NQ (Kwiatkowski et al. 2019) and Trivia QA (Joshi et al. 2017). Additionally, we evaluated generative retrieval methods on the MSMARCO dataset (Nguyen et al. 2016). While the paper mentions 'training set' and 'test set', it does not provide explicit percentages, counts, or a clear description of validation splits or how these splits were obtained (e.g., specific file names or citations to standard splits with details).
Hardware Specification	Yes	Our main experiments were conducted on a single NVIDIA A100 GPU with 80 GB of memory. [...] We conducted tests on LTRGR using a beam size of 15 on one V100 GPU with 32GB memory.
Software Dependencies	No	To ensure a fair comparison with previous work, we utilized BART-large as our backbone. [...] In the learning to rank phase, we used the Adam optimizer with a learning rate of 1e-5, trained with a batch size of 4, and conducted training for three epochs. The paper mentions software components like 'BART-large' and 'Adam optimizer' but does not specify their version numbers or the versions of any underlying frameworks (e.g., PyTorch, TensorFlow).
Experiment Setup	Yes	In the learning to rank phase, we used the Adam optimizer with a learning rate of 1e-5, trained with a batch size of 4, and conducted training for three epochs. For each query in the training set, we retrieved the top 200 passages and selected positive and negative passages from them. During training, we kept 40 predicted identifiers for each passage and removed any exceeding ones. The margin m and weight λ are set as 500 and 1000, respectively.