reproducibilityindex.ai

A Gradient Accumulation Method for Dense Retriever under Memory Constraint

Authors: Jaehee Kim, Yukyung Lee, Pilsung Kang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on widely used five information retrieval datasets indicate that CONTACCUM can surpass not only existing memory reduction methods but also high-resource scenario.
Researcher Affiliation	Academia	1Seoul National University 2Boston University {jaehee_kim, pilsung_kang}@snu.ac.kr ylee5@bu.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	All codes and links to download the datasets are included in the supplemental material. Additionally, we plan to release the codes for reproducibility of the main experimental results after the review process to preserve anonymity.
Open Datasets	Yes	The datasets used for the experiments were Natural Questions (NQ) [18], Trivia QA [15], Curated TREC (TREC) [1], and Web Questions (Web Q) [2] processed by DPR and MS Marco [26].
Dataset Splits	Yes	The optimal memory bank size, Nmemory, was selected using evaluation data with candidates [128, 512, 2048], resulting in 2,048 for NQ and 512 for Trivia QA. For MS Marco, Web Q, and TREC, due to the lack of evaluation data, Nmemory were set based on dataset size: 1,024 for MS Marco, and 128 for Web Q and TREC.
Hardware Specification	Yes	All experiments were conducted on a single A100 80GB GPU. For high-resource scenario, we considered situations where 80GB of memory is available. For low-resource settings, we assumed available memory as widely used commercial GPUs: 11GB (GTX-1080Ti), 24GB (RTX-3080Ti, RTX-4090Ti).
Software Dependencies	No	The experimental code was adapted from nano-DPR3, which provides a simplified training and evaluation pipeline for DPR. All experiments were conducted using the BERT4 [6] model. For retrieval, we used the FAISS [14] library to perform exact nearest neighbor search with default hyperparameters. Using the torch.cuda.set_per_process_memory_fraction function in Py Torch [27] allows for restricting the memory used during training, regardless of the total available memory.
Experiment Setup	Yes	The hyperparameters for training were set as follows: the warmup step was 1,237 steps, weight decay was set to 0, and a customized scheduler with a linear decay of the learning rate after the warmup was used. The optimizer was Adam W [23] with epsilon set to 1e-8, and the learning rate was 2e-5. Gradient clipping was applied at a value of 2.0, and τ was set to 1.