reproducibilityindex.ai

IncDSI: Incrementally Updatable Document Retrieval

Authors: Varsha Kishore, Chao Wan, Justin Lovelace, Yoav Artzi, Kilian Q Weinberger

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach by incrementally adding up to 10k documents to a trained retrieval model, evaluating both retrieval performance and the speed of adding documents.
Researcher Affiliation	Academia	1School of Computer Science, Cornell University, Ithaca, USA. Correspondence to: Varsha Kishore <vk352@cornell.edu>, Justin Lovelace <jl3353@cornell.edu>.
Pseudocode	Yes	Algorithm 1 Inc DSI
Open Source Code	Yes	Our code for Inc DSI is available at https://github.com/varshakishore/Inc DSI.
Open Datasets	Yes	We conduct our experiments on two publicly available datasets Natural Questions 320K (Kwiatkowski et al., 2019) and MS MARCO Document Ranking (Nguyen et al., 2016).
Dataset Splits	Yes	We randomly sample 90% of the documents to form the initial set D0, 9% of the documents to form the new set D and 1% of the documents to form the tuning set D . Each dataset also has natural human queries that are associated with the documents. We use the official NQ and MSMARCO train-validation splits to divide the queries into train/val/test splits as follows: the train split is divided into 80% train/ 20% validation data and the validation split is used as test data.
Hardware Specification	Yes	For all our experiments, we use one A6000 GPU.
Software Dependencies	No	The paper mentions 'Pytorch' and 'Ax library' but does not specify their version numbers, which are required for a reproducible description of software dependencies.
Experiment Setup	Yes	For the continual training baselines, the document retrieval model is trained for 20 epochs on the initial set of documents and for an additional 10 epochs on both the initial and new documents. A learning rate of 1e-5 and 5e-5 and a batch size of 128 and 1024 are used for NQ320K and MSMARCO respectively.