Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs
Authors: Woomin Song, Seunghyuk Oh, Sangwoo Mo, Jaehyung Kim, Sukmin Yun, Jung-Woo Ha, Jinwoo Shin
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate the proposed method s superior performance and memory efficiency, enabling the broader use of LLMs in contexts requiring extended context. ... Through extensive evaluation on downstream tasks and perplexity measurements, we demonstrate that HOMER can effectively extend pre-trained LLMs to handle long inputs beyond their context limits. |
| Researcher Affiliation | Collaboration | Woomin Song1, Seunghyuk Oh1, Sangwoo Mo2 Jaehyung Kim3, Sukmin Yun4, Jung-Woo Ha5 Jinwoo Shin1 1KAIST 2University of Michigan 3Carnegie Mellon University 4Hanyang University ERICA 5NAVER |
| Pseudocode | Yes | Algorithm 1 Memory-efficient computation ordering |
| Open Source Code | Yes | Code is available at https://github.com/alinlab/HOMER. |
| Open Datasets | Yes | We select Llama-2 as our base model... To this end, we sample 25 long documents from the PG-19 dataset (Rae et al., 2019)... To this end, we measure the model s performance on the validation set of Qu ALITY (Pang et al., 2021). |
| Dataset Splits | Yes | We measure the model s performance on the validation set of Qu ALITY (Pang et al., 2021). ... Calibration is performed using 100 text corpora segments from the validation set and the test set of Wiki Text-103 (Merity et al., 2016). |
| Hardware Specification | Yes | All efficiency measurements are done with a single A100 GPU. ... All measurements are taken on a single A100 GPU, with Flash Attention 2 (Dao, 2023) applied. |
| Software Dependencies | No | The paper mentions 'Flash Attention 2 (Dao, 2023)' and 'Llama-2' but does not specify version numbers for these or other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We select Llama-2 as our base model... In all experiments involving HOMER, the maximum chunk length was set to be half of the context limit. We assign 12 additional layers for 7b models and 20 layers for 13b models. Calibration is performed using 100 text corpora segments from the validation set and the test set of Wiki Text-103 (Merity et al., 2016). |