Augmenting Language Models with Long-Term Memory

Authors: Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our method outperforms strong long-context models on Chapter Break, a challenging long-context modeling benchmark, and achieves remarkable improvements on memory-augmented in-context learning over LLMs.
Researcher Affiliation Collaboration University of California, Santa Barbara Microsoft Research weizhiwang@ucsb.edu, {lidong1, chehao, xiaodl}@microsoft.com
Pseudocode No The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured code blocks.
Open Source Code Yes Our code is open-sourced at https://aka.ms/Long Mem.
Open Datasets Yes We sample a subset of the Pile [GBB+20] as the training corpus, including Book Corpus2, Books3, Open Web Text2, Stack Exchange, Wikipedia, Gutenberg (PG-19), NIH Ex Porter, and Pile-CC datasets.
Dataset Splits Yes We provide different validation splits of PG-22 based on length range, and the data statistics are presented in Table 1.
Hardware Specification Yes The pre-training and adaptation are trained on 16 32GB-Tesla-V100 GPUs.
Software Dependencies No The paper mentions using the 'Adam optimizer' and 'faiss toolkit' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes The training for memory-augmented adaptation iterates on 26B tokens, with a global 256 batch-size and 1024 sequence length. The chunk-size csz is 4 tokens and the memory size M is 65k key-value pairs of tokens. For each token, we retrieve K=64 attention key-value pairs for augmentation, which are K/csz=16 text chunks. The memory-augmentation layer is the 9-th layer of Side Net. The attention keys and values from 18-th layer of backbone LLM is cached into memory and used for future retrieval. Other training details are presented in Appendix C.