Compressed Context Memory for Online Language Model Interaction
Authors: Jang-Hyun Kim, Junyoung Yeom, Sangdoo Yun, Hyun Oh Song
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through evaluations on conversation, personalization, and multi-task learning, we demonstrate that our approach achieves the performance level of a full context model with 5 smaller context memory size. We further demonstrate the applicability of our approach in a streaming setting with an unlimited context length, outperforming the sliding window approach. Codes are available at https://github.com/snu-mllab/context-memory. |
| Researcher Affiliation | Collaboration | Jang-Hyun Kim1 2, Junyoung Yeom1 2, Sangdoo Yun3 , Hyun Oh Song1 2 1Seoul National University, 2Artificial Intelligence Institute (AIIS), 3NAVER AI Lab |
| Pseudocode | Yes | Algorithm 1 Training stage for compression |
| Open Source Code | Yes | Codes are available at https://github.com/snu-mllab/context-memory. |
| Open Datasets | Yes | Datasets and metrics We conduct evaluations using three datasets: Meta ICL (Min et al., 2022), La MP (Salemi et al., 2023), and Daily Dialog (Li et al., 2017). |
| Dataset Splits | No | The paper does not specify validation splits for its primary datasets (Meta ICL, La MP, Daily Dialog) in the main experiment setup. It mentions 'PG19 validation set' in the context of streaming evaluation (Figure 8), but this is a pre-existing dataset split rather than a split created or defined for their model training/hyperparameter tuning across all experiments. |
| Hardware Specification | Yes | Individual training runs take 3 to 24 hours on a single NVIDIA A100 with 80GB memory. |
| Software Dependencies | No | The paper mentions 'LLa MA pretrained models (Touvron et al., 2023)' and 'FP16' for mixed precision but does not provide specific version numbers for software libraries or dependencies like PyTorch, Python, or CUDA. |
| Experiment Setup | Yes | Table 8: Training recipes of our experiments for LLa MA models. Includes 'Training steps', 'Batch size', '# training samples', 'Learning rate', 'Learning rate scheduling', 'Mixed precision', 'COMP token length'. Table 9: Lo RA configurations for LLa MA models. Includes 'Target modules', 'Rank', 'Alpha', 'Dropout'. |