reproducibilityindex.ai

LLM Maybe LongLM: SelfExtend LLM Context Window Without Tuning

Authors: Hongye Jin, Xiaotian Han, Jingfeng Yang, Zhimeng Jiang, Zirui Liu, Chia-Yuan Chang, Huiyuan Chen, Xia Hu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct comprehensive experiments on multiple benchmarks and the results show that our Self Extend can effectively extend existing LLMs context window length. and 4. Experiments We evaluate Self Extend with Llama-2 (Touvron et al., 2023), Mistral (Jiang et al., 2023a), SOLAR (Kim et al., 2023), and Phi-2 (Javaheripi et al., 2023) on language modeling task, synthetic long context tasks, real-world long context tasks and standard short-context tasks.
Researcher Affiliation	Collaboration	1Texas A&M University 2Amazon, the views expressed or the conclusions reached are his own and do not represent the view of Amazon 3Rice University 4Case Western Reserve University. Correspondence to: Hongye Jin <jhy0410@tamu.edu>.
Pseudocode	Yes	Algorithm 1 Py Torch-style Pseudocode of Self Extend
Open Source Code	Yes	The code can be found at https://github.com/datamllab/Long LM.
Open Datasets	Yes	We evaluate Self Extend s language modeling performance on dataset PG19 (Rae et al., 2019), which contains lengthy books. and We further use two recent real-world long context benchmarks: Long Bench (Bai et al., 2023) and L-Eval (An et al., 2023).
Dataset Splits	No	The paper does not explicitly state training/validation/test dataset splits (e.g., percentages, sample counts, or specific predefined split citations) for all datasets used. While it mentions using a 'test set' for some evaluations, comprehensive split information is not provided.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, or memory) used to conduct the experiments.
Software Dependencies	No	The paper mentions 'Py Torch-style Pseudocode' but does not specify version numbers for PyTorch or any other software dependencies required to reproduce the experiments.
Experiment Setup	Yes	For Llama-2 & Llama-2-chat based Self Extend, the group size is 16 and neighbor window is 1024; for Mistral based Self Extend, the group size is 6 and neighbor window is 1024; for Phi-2 based Self Extend, the group size is 12 and neighbor window is 512.