reproducibilityindex.ai

Generative Retrieval Meets Multi-Graded Relevance

Authors: Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten Rijke, Wei Chen, Xueqi Cheng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR2.
Researcher Affiliation	Academia	1CAS Key Lab of Network Data Science and Technology, ICT, CAS 2University of Chinese Academy of Sciences 3University of Amsterdam
Pseudocode	No	The paper presents mathematical formulas and figures, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code	Yes	The NeurIPS checklist states: 'Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: Section 5'.
Open Datasets	Yes	We select three widely-used multi-graded relevance datasets: Gov2 [18], Clue Web09-B [19] and Robust04 [82]... Furthermore, we consider two binary relevance datasets: MS MARCO Document Ranking [57] and Natural Questions (NQ 320K) [38].
Dataset Splits	Yes	The value of \|r\| is tuned on the validation set to optimize the trade-off between relevance and distinctness.
Hardware Specification	Yes	We train GR2 on eight NVIDIA Tesla A100 80GB GPUs.
Software Dependencies	Yes	GR2 and the reproduced baselines are implemented with Py Torch 1.9.0 and Hugging Face transformers 4.16.2;
Experiment Setup	Yes	For hyperparameters, we use the Adam optimizer with a linear warm-up over the first 10% steps. The learning rate is 5e-5, label smoothing is 0.1, weight decay is 0.01, sequence length of documents is 512, max training steps are 50K, and batch size is 60. We train GR2 on eight NVIDIA Tesla A100 80GB GPUs. For more details, please see Appendix F.