Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs
Authors: Xin Ma, Yang Liu, Jingjing Liu, Xiaoxu Ma
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments validate the effectiveness of Mesa-Extrapolation, demonstrating its potential as a scalable solution to enhancing LLMs applicative reach. |
| Researcher Affiliation | Collaboration | Xin Ma1, Yang Liu2,3 , Jingjing Liu2, Xiaoxu Ma1 1Digital Research Institute, Enn Group, Beijing, China 2Institute for AI Industry Research, Tsinghua University, Beijing, China 3Shanghai Artificial Intelligence Laboratory, China |
| Pseudocode | Yes | Algorithm 1 Mesa-Extrapolation Algorithm |
| Open Source Code | Yes | Our code is available at https: //github.com/soacker/Mesa-Extrapolation. |
| Open Datasets | Yes | We choose Gov Report Huang et al. (2021), Pile Gao et al. (2020), Long Bench Bai et al. (2023), and Long Eval Krishna et al. (2023) datasets, and also generate a passkey dataset, which has been integrated in the code warehouse. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. |
| Hardware Specification | Yes | We use a 2x A800 80GB NVIDIA GPU server as the experimental environment and adopt the Py Torch framework. |
| Software Dependencies | No | The paper only mentions "Py Torch framework" without providing specific version numbers for software dependencies. |
| Experiment Setup | Yes | In general, we set F = 100, Mmax = 200 and L = 512. Additionally, Stair PE is primarily employed in the manipulation of the last chunk... In Equ.3, we generally set the extrapolated position N = 512 and set the extrapolated width E = 50. |