In-context Autoencoder for Context Compression in a Large Language Model

Authors: Tao Ge, Hu Jing, Lei Wang, Xun Wang, Si-Qing Chen, Furu Wei

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that our lightweight ICAE, introducing about 1% additional parameters, effectively achieves 4 context compression based on Llama, offering advantages in both improved latency and GPU memory cost during inference, and showing an interesting insight in memorization as well as potential for scalability. These promising results imply a novel perspective on the connection between working memory in cognitive science and representation learning in LLMs, revealing ICAE s significant implications in addressing the long context problem and suggesting further research in LLM context management.
Researcher Affiliation Industry Tao Ge Jing Hu Lei Wang Xun Wang Si-Qing Chen Furu Wei Microsoft Corporation {tage,v-hjing,v-leiwang7,xunwang,sqchen,fuwei}@microsoft.com
Pseudocode No The paper includes figures illustrating the model and training processes, but no structured pseudocode or algorithm blocks.
Open Source Code Yes Our data, code and models are available at https://github.com/getao/icae.
Open Datasets Yes We pretrain the ICAE with the Pile (Gao et al., 2020). For instruction fine-tuning, we use the PWC dataset, as introduced in Section 2.3, which contains 240k (context, prompt, response) samples for training and 18k samples for testing. Our data, code and models are available at https://github.com/getao/icae.
Dataset Splits No The dataset is composed of 240k examples for training purposes, with an additional 18k examples for testing. No explicit validation split information (e.g., percentages or counts) is provided.
Hardware Specification Yes We train the ICAE on 8 Nvidia A100 GPUs (80GB). We test the latency (Section 3.3.2) on 1 Nvidia A100 GPU (80GB). The test machine has the CPU of AMD EPYC 7413 with 24 cores and 216GB RAM.
Software Dependencies Yes The runtime configuration is python=3.9, pytorch=2.0.1, cuda=11.7, cudnn=8.5.
Experiment Setup Yes The hyperparameters for pretraining and fine-tuning ICAE are presented in Table 8. Table 8: Hyperparameters for training Hyperparameter Value Optimizer Adam W learning rate 1e-4 (pretrain); 5e-5 (fine-tuning) batch size 256 warmup 300 #updates 200k (pretrain); 30k (fine-tuning) clip norm 2.0