reproducibilityindex.ai

Augmenting Language Models with Long-Term Memory

Authors: Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our method outperforms strong long-context models on Chapter Break, a challenging long-context modeling benchmark, and achieves remarkable improvements on memory-augmented in-context learning over LLMs.
Researcher Affiliation	Collaboration	University of California, Santa Barbara Microsoft Research weizhiwang@ucsb.edu, {lidong1, chehao, xiaodl}@microsoft.com
Pseudocode	No	The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured code blocks.
Open Source Code	Yes	Our code is open-sourced at https://aka.ms/Long Mem.
Open Datasets	Yes	We sample a subset of the Pile [GBB+20] as the training corpus, including Book Corpus2, Books3, Open Web Text2, Stack Exchange, Wikipedia, Gutenberg (PG-19), NIH Ex Porter, and Pile-CC datasets.
Dataset Splits	Yes	We provide different validation splits of PG-22 based on length range, and the data statistics are presented in Table 1.
Hardware Specification	Yes	The pre-training and adaptation are trained on 16 32GB-Tesla-V100 GPUs.
Software Dependencies	No	The paper mentions using the 'Adam optimizer' and 'faiss toolkit' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	The training for memory-augmented adaptation iterates on 26B tokens, with a global 256 batch-size and 1024 sequence length. The chunk-size csz is 4 tokens and the memory size M is 65k key-value pairs of tokens. For each token, we retrieve K=64 attention key-value pairs for augmentation, which are K/csz=16 text chunks. The memory-augmentation layer is the 9-th layer of Side Net. The attention keys and values from 18-th layer of backbone LLM is cached into memory and used for future retrieval. Other training details are presented in Appendix C.