Memoria: Resolving Fateful Forgetting Problem through Human-Inspired Memory Architecture

Authors: Sangjun Park, Jinyeong Bak

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results prove the effectiveness of Memoria in the diverse tasks of sorting, language modeling, and classification, surpassing conventional techniques.
Researcher Affiliation Academia 1Department of Computer Science and Engineering, Sungkyunkwan University, Suwon, South Korea. Correspondence to: Jin Yeong Bak <jy.bak@skku.edu>.
Pseudocode Yes Algorithm 1 Retrieve Stage
Open Source Code Yes 1The implementation of Memoria and all experimental code are publicly available at https://github.com/cosmoquester/memoria
Open Datasets Yes Secondly, we performed language modeling for token-level on WikiText-103 (Raw) (Merity et al., 2017) and PG-19 (Rae et al., 2020), and character-level on enwik8 (Mahoney, 2006). ... Lastly, we conducted the classification task on the long document classification dataset, Hyperpartisan (Kiesel et al., 2019).
Dataset Splits Yes We report validation and test set results because of data distribution discrepancies.
Hardware Specification Yes One or more NVIDIA A100 or A6000 GPUs were used for training.
Software Dependencies No The paper mentions software like GPT-2 tokenizer, Adam optimizer, linear scheduler, and PyTorch, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes For all sorting experiments, a batch size of 32, a warmup rate of 0.06, a learning rate of 2e-4, and an epoch of 5 were used for 80,000 train examples. Memoria parameters used in the experiment were as follows: an initial lifespan of 5, a lifespan extension scale α of 8, and a long-term memory search depth Ndepth of 10 in all cases.