reproducibilityindex.ai

Bottleneck-Minimal Indexing for Generative Document Retrieval

Authors: Xin Du, Lixin Xiu, Kumiko Tanaka-Ishii

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically quantify the bottleneck underlying GDR. Finally, using the NQ320K and MARCO datasets, we evaluate our proposed bottleneckminimal indexing method in comparison with various previous indexing methods, and we show that it outperforms those methods.
Researcher Affiliation	Academia	1Waseda Research Institute for Science and Engineering, Waseda University 2Department of Mathematical Informatics, The University of Tokyo 3Department of Computer Science and Engineering, Waseda University. Correspondence to: Xin Du <duxin@aoni.waseda.jp>, Kumiko Tanaka-Ishii <kumiko@waseda.jp>.
Pseudocode	No	The paper describes methods and concepts but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code	Yes	The code is available at https://github.com/kduxin/ Bottleneck-Minimal-Indexing.
Open Datasets	Yes	We evaluated different indexing methods on two datasets: NQ320K (Kwiatkowski et al., 2019), and MARCO Lite, which is a subset extracted from the document ranking dataset in MS MARCO (Nguyen et al., 2016).
Dataset Splits	Yes	Table 1: Descriptive statistics of the datasets (upper) and generated queries (lower). NQ320K MS MARCO Lite # mean # words # mean # words documents 109,739 4902.7 138,457 1210.1 queries (train) 307,373 9.2 183,947 6.0 queries (test) 7,830 9.3 2,792 5.9
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or specific computing environments used for experiments.
Software Dependencies	No	The paper mentions specific models like "T5-tiny", "T5-mini", "T5-small", "T5-base", and "BERT model", and links to huggingface models, but does not provide specific version numbers for software dependencies (e.g., "PyTorch 1.9", "Transformers 4.2") for its own implementation.
Experiment Setup	Yes	All models were trained using the default hyperparameters of NCI, as provided in its ofﬁcial Git Hub repository. ... Updates to the parameters were implemented using the Adam W optimizer (Loshchilov & Hutter, 2017), with β1 = 0.9, β2 = 0.999, eps = 10 8, and a weight decay of 0.01. The learning rate was set at 5 10 5. This ﬁnetuning process was executed 10 epochs on the training set of NQ320K.