In-context Autoencoder for Context Compression in a Large Language Model
Authors: Tao Ge, Hu Jing, Lei Wang, Xun Wang, Si-Qing Chen, Furu Wei
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that our lightweight ICAE, introducing about 1% additional parameters, effectively achieves 4 context compression based on Llama, offering advantages in both improved latency and GPU memory cost during inference, and showing an interesting insight in memorization as well as potential for scalability. These promising results imply a novel perspective on the connection between working memory in cognitive science and representation learning in LLMs, revealing ICAE s significant implications in addressing the long context problem and suggesting further research in LLM context management. |
| Researcher Affiliation | Industry | Tao Ge Jing Hu Lei Wang Xun Wang Si-Qing Chen Furu Wei Microsoft Corporation {tage,v-hjing,v-leiwang7,xunwang,sqchen,fuwei}@microsoft.com |
| Pseudocode | No | The paper includes figures illustrating the model and training processes, but no structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our data, code and models are available at https://github.com/getao/icae. |
| Open Datasets | Yes | We pretrain the ICAE with the Pile (Gao et al., 2020). For instruction fine-tuning, we use the PWC dataset, as introduced in Section 2.3, which contains 240k (context, prompt, response) samples for training and 18k samples for testing. Our data, code and models are available at https://github.com/getao/icae. |
| Dataset Splits | No | The dataset is composed of 240k examples for training purposes, with an additional 18k examples for testing. No explicit validation split information (e.g., percentages or counts) is provided. |
| Hardware Specification | Yes | We train the ICAE on 8 Nvidia A100 GPUs (80GB). We test the latency (Section 3.3.2) on 1 Nvidia A100 GPU (80GB). The test machine has the CPU of AMD EPYC 7413 with 24 cores and 216GB RAM. |
| Software Dependencies | Yes | The runtime configuration is python=3.9, pytorch=2.0.1, cuda=11.7, cudnn=8.5. |
| Experiment Setup | Yes | The hyperparameters for pretraining and fine-tuning ICAE are presented in Table 8. Table 8: Hyperparameters for training Hyperparameter Value Optimizer Adam W learning rate 1e-4 (pretrain); 5e-5 (fine-tuning) batch size 256 warmup 300 #updates 200k (pretrain); 30k (fine-tuning) clip norm 2.0 |