Compressed Context Memory for Online Language Model Interaction

Authors: Jang-Hyun Kim, Junyoung Yeom, Sangdoo Yun, Hyun Oh Song

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through evaluations on conversation, personalization, and multi-task learning, we demonstrate that our approach achieves the performance level of a full context model with 5 smaller context memory size. We further demonstrate the applicability of our approach in a streaming setting with an unlimited context length, outperforming the sliding window approach. Codes are available at https://github.com/snu-mllab/context-memory.
Researcher Affiliation Collaboration Jang-Hyun Kim1 2, Junyoung Yeom1 2, Sangdoo Yun3 , Hyun Oh Song1 2 1Seoul National University, 2Artificial Intelligence Institute (AIIS), 3NAVER AI Lab
Pseudocode Yes Algorithm 1 Training stage for compression
Open Source Code Yes Codes are available at https://github.com/snu-mllab/context-memory.
Open Datasets Yes Datasets and metrics We conduct evaluations using three datasets: Meta ICL (Min et al., 2022), La MP (Salemi et al., 2023), and Daily Dialog (Li et al., 2017).
Dataset Splits No The paper does not specify validation splits for its primary datasets (Meta ICL, La MP, Daily Dialog) in the main experiment setup. It mentions 'PG19 validation set' in the context of streaming evaluation (Figure 8), but this is a pre-existing dataset split rather than a split created or defined for their model training/hyperparameter tuning across all experiments.
Hardware Specification Yes Individual training runs take 3 to 24 hours on a single NVIDIA A100 with 80GB memory.
Software Dependencies No The paper mentions 'LLa MA pretrained models (Touvron et al., 2023)' and 'FP16' for mixed precision but does not provide specific version numbers for software libraries or dependencies like PyTorch, Python, or CUDA.
Experiment Setup Yes Table 8: Training recipes of our experiments for LLa MA models. Includes 'Training steps', 'Batch size', '# training samples', 'Learning rate', 'Learning rate scheduling', 'Mixed precision', 'COMP token length'. Table 9: Lo RA configurations for LLa MA models. Includes 'Target modules', 'Rank', 'Alpha', 'Dropout'.