reproducibilityindex.ai

A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

Authors: Kuang-Huei Lee, Xinyun Chen, Hiroki Furuta, John Canny, Ian Fischer

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Read Agent against baselines using retrieval methods, using the original long contexts, and using the gist memories. These evaluations are performed on three long-document reading comprehension tasks: Qu ALITY, Narrative QA, and QMSum.
Researcher Affiliation	Industry	1Google Deep Mind. Correspondence to: Kuang-Huei Lee <leekh@google.com>, Ian Fischer <iansf@google.com>.
Pseudocode	No	The paper describes the steps of Read Agent using prose and example prompts, but it does not include formal pseudocode blocks or algorithms.
Open Source Code	Yes	Project website and demo: read-agent.github.io. We release the prompts for each task on read-agent.github.io.
Open Datasets	Yes	We evaluate Read Agent s long-document reading comprehension ability on three long-context question-answering challenges: Qu ALITY (Pang et al., 2022), Narrative QA (Koˇcisk y et al., 2018) and QMSum (Zhong et al., 2021).
Dataset Splits	Yes	Although Read Agent does not require any model training, we develop the proposed method on the training sets and test on the validation, test and/or development sets to avoid any risk of overﬁtting system hyperparameters.
Hardware Specification	No	The paper mentions using "instruction-tuned Pa LM 2-L" and "GPT-3.5 Turbo" which are language models, not hardware specifications like GPU models, CPU models, or cloud computing instances.
Software Dependencies	No	The paper mentions using "instruction-tuned Pa LM 2-L (Anil et al., 2023)" and "GPT-3.5 Turbo" and "Gemini API embedding model (models/embedding-001)", but it does not provide specific version numbers for general software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	As described in Section 3.1, max words and min words are two episode pagination hyperparameters. Table 8 gives their values for each of the experiments in Section 4.