Generative Retrieval Meets Multi-Graded Relevance
Authors: Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten Rijke, Wei Chen, Xueqi Cheng
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR2. |
| Researcher Affiliation | Academia | 1CAS Key Lab of Network Data Science and Technology, ICT, CAS 2University of Chinese Academy of Sciences 3University of Amsterdam |
| Pseudocode | No | The paper presents mathematical formulas and figures, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | The NeurIPS checklist states: 'Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: Section 5'. |
| Open Datasets | Yes | We select three widely-used multi-graded relevance datasets: Gov2 [18], Clue Web09-B [19] and Robust04 [82]... Furthermore, we consider two binary relevance datasets: MS MARCO Document Ranking [57] and Natural Questions (NQ 320K) [38]. |
| Dataset Splits | Yes | The value of |r| is tuned on the validation set to optimize the trade-off between relevance and distinctness. |
| Hardware Specification | Yes | We train GR2 on eight NVIDIA Tesla A100 80GB GPUs. |
| Software Dependencies | Yes | GR2 and the reproduced baselines are implemented with Py Torch 1.9.0 and Hugging Face transformers 4.16.2; |
| Experiment Setup | Yes | For hyperparameters, we use the Adam optimizer with a linear warm-up over the first 10% steps. The learning rate is 5e-5, label smoothing is 0.1, weight decay is 0.01, sequence length of documents is 512, max training steps are 50K, and batch size is 60. We train GR2 on eight NVIDIA Tesla A100 80GB GPUs. For more details, please see Appendix F. |