HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
Authors: Bernal Jimenez Gutierrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, Yu Su
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare Hippo RAG with existing RAG methods on multi-hop question answering (QA) and show that our method outperforms the state-of-the-art methods remarkably, by up to 20%. |
| Researcher Affiliation | Academia | Bernal Jiménez Gutiérrez The Ohio State University Yiheng Shu The Ohio State University Yu Gu The Ohio State University Michihiro Yasunaga Stanford University Yu Su The Ohio State University |
| Pseudocode | No | The paper describes its methodology in prose and figures (e.g., Figure 2, Figure 4, Figure 5), but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code and data are available at https://github.com/OSU-NLP-Group/Hippo RAG. |
| Open Datasets | Yes | We evaluate our method s retrieval capabilities primarily on two challenging multi-hop QA benchmarks, Mu Si Que (answerable) [77] and 2Wiki Multi Hop QA [33]. |
| Dataset Splits | Yes | To limit the experimental cost, we extract 1,000 questions from each validation set as done in previous work [63, 78]. In order to create a more realistic retrieval setting, we follow IRCo T [78] and collect all candidate passages (including supporting and distractor passages) from our selected questions and form a retrieval corpus for each dataset. The details of these datasets are shown in Table 1. |
| Hardware Specification | Yes | We run Col BERTv2 and Contriever for indexing and retrieval we use 4 NVIDIA RTX A6000 GPUs with 48GB of memory. For indexing with Llama-3.1 models, we use 4 NVIDIA H100 GPUs with 80GB of memory. Finally, we used 2 AMD EPYC 7513 32-Core Processors to run the Personalized Page Rank algorithm. |
| Software Dependencies | No | We use implementations based on Py Torch [59] and Hugging Face [86] for both Contriever [35] and Col BERTv2 [70]. We use the python-igraph [13] implementation of the PPR algorithm. For BM25, we employ Elastic Search [24]. |
| Experiment Setup | Yes | By default, we use GPT-3.5-turbo-1106 [55] with temperature of 0 as our LLM L and Contriever [35] or Col BERTv2 [70] as our retriever M. We use 100 examples from Mu Si Que s training data to tune Hippo RAG s two hyperparameters: the synonymy threshold τ at 0.8 and the PPR damping factor at 0.5, which determines the probability that PPR will restart a random walk from the query nodes instead of continuing to explore the graph. |