Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning to Tokenize for Generative Retrieval
Authors: Weiwei Sun, Lingyong Yan, Zheng Chen, Shuaiqiang Wang, Haichao Zhu, Pengjie Ren, Zhumin Chen, Dawei Yin, Maarten Rijke, Zhaochun Ren
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on the NQ320K, MS MARCO, and BEIR datasets. |
| Researcher Affiliation | Collaboration | 1Shandong University, China 2Baidu Inc., China 3University of Amsterdam, The Netherlands 4Leiden University, The Netherlands |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 2The code of this work is available at www.github.com/sunnweiwei/Gen Ret. |
| Open Datasets | Yes | We conduct experiments on three well-known document retrieval benchmark datasets, NQ320K [15, 37], MS MARCO [4, 46], and BEIR [38]. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits, percentages, or absolute sample counts for each split in the main text. It mentions 'training data' and 'test sets' but not the specific splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | Yes | The proposed models and the reproduced baselines are implemented with Py Torch 1.7.1 and Hugging Face transformers 4.22.2. |
| Experiment Setup | Yes | We utilize the T5-Base model [27] as the base Transformer and initialize a new codebook embedding Et for each time step. We set the number of clusters to be K = 512 for all datasets, with the length of the docid M being dependent on the number of documents present. In the docid re-assignment, the hyper-parameter ϵ is set to 1.0, and the Sinkhorn-Knopp algorithm is executed for 100 iterations. We optimize the model using Adam W and set the learning rate to 5e 4. The batch size is 256, and the model is optimized for up to 500k steps for each timestep. We add a factor of 0.1 to the reconstruction losses to balance the scale. |