LLM Maybe LongLM: SelfExtend LLM Context Window Without Tuning
Authors: Hongye Jin, Xiaotian Han, Jingfeng Yang, Zhimeng Jiang, Zirui Liu, Chia-Yuan Chang, Huiyuan Chen, Xia Hu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments on multiple benchmarks and the results show that our Self Extend can effectively extend existing LLMs context window length. and 4. Experiments We evaluate Self Extend with Llama-2 (Touvron et al., 2023), Mistral (Jiang et al., 2023a), SOLAR (Kim et al., 2023), and Phi-2 (Javaheripi et al., 2023) on language modeling task, synthetic long context tasks, real-world long context tasks and standard short-context tasks. |
| Researcher Affiliation | Collaboration | 1Texas A&M University 2Amazon, the views expressed or the conclusions reached are his own and do not represent the view of Amazon 3Rice University 4Case Western Reserve University. Correspondence to: Hongye Jin <jhy0410@tamu.edu>. |
| Pseudocode | Yes | Algorithm 1 Py Torch-style Pseudocode of Self Extend |
| Open Source Code | Yes | The code can be found at https://github.com/datamllab/Long LM. |
| Open Datasets | Yes | We evaluate Self Extend s language modeling performance on dataset PG19 (Rae et al., 2019), which contains lengthy books. and We further use two recent real-world long context benchmarks: Long Bench (Bai et al., 2023) and L-Eval (An et al., 2023). |
| Dataset Splits | No | The paper does not explicitly state training/validation/test dataset splits (e.g., percentages, sample counts, or specific predefined split citations) for all datasets used. While it mentions using a 'test set' for some evaluations, comprehensive split information is not provided. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, or memory) used to conduct the experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch-style Pseudocode' but does not specify version numbers for PyTorch or any other software dependencies required to reproduce the experiments. |
| Experiment Setup | Yes | For Llama-2 & Llama-2-chat based Self Extend, the group size is 16 and neighbor window is 1024; for Mistral based Self Extend, the group size is 6 and neighbor window is 1024; for Phi-2 based Self Extend, the group size is 12 and neighbor window is 512. |