reproducibilityindex.ai

In-context Autoencoder for Context Compression in a Large Language Model

Authors: Tao Ge, Hu Jing, Lei Wang, Xun Wang, Si-Qing Chen, Furu Wei

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that our lightweight ICAE, introducing about 1% additional parameters, effectively achieves 4 context compression based on Llama, offering advantages in both improved latency and GPU memory cost during inference, and showing an interesting insight in memorization as well as potential for scalability. These promising results imply a novel perspective on the connection between working memory in cognitive science and representation learning in LLMs, revealing ICAE s significant implications in addressing the long context problem and suggesting further research in LLM context management.
Researcher Affiliation	Industry	Tao Ge Jing Hu Lei Wang Xun Wang Si-Qing Chen Furu Wei Microsoft Corporation {tage,v-hjing,v-leiwang7,xunwang,sqchen,fuwei}@microsoft.com
Pseudocode	No	The paper includes figures illustrating the model and training processes, but no structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our data, code and models are available at https://github.com/getao/icae.
Open Datasets	Yes	We pretrain the ICAE with the Pile (Gao et al., 2020). For instruction fine-tuning, we use the PWC dataset, as introduced in Section 2.3, which contains 240k (context, prompt, response) samples for training and 18k samples for testing. Our data, code and models are available at https://github.com/getao/icae.
Dataset Splits	No	The dataset is composed of 240k examples for training purposes, with an additional 18k examples for testing. No explicit validation split information (e.g., percentages or counts) is provided.
Hardware Specification	Yes	We train the ICAE on 8 Nvidia A100 GPUs (80GB). We test the latency (Section 3.3.2) on 1 Nvidia A100 GPU (80GB). The test machine has the CPU of AMD EPYC 7413 with 24 cores and 216GB RAM.
Software Dependencies	Yes	The runtime configuration is python=3.9, pytorch=2.0.1, cuda=11.7, cudnn=8.5.
Experiment Setup	Yes	The hyperparameters for pretraining and fine-tuning ICAE are presented in Table 8. Table 8: Hyperparameters for training Hyperparameter Value Optimizer Adam W learning rate 1e-4 (pretrain); 5e-5 (fine-tuning) batch size 256 warmup 300 #updates 200k (pretrain); 30k (fine-tuning) clip norm 2.0