DAPE: Data-Adaptive Positional Encoding for Length Extrapolation

Authors: Chuanyang Zheng, Yihang Gao, Han Shi, Minbin Huang, Jingyao Li, Jing Xiong, Xiaozhe Ren, Michael Ng, Xin Jiang, Zhenguo Li, Yu Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental validation on real-world datasets (Arxiv, Books3, and CHE) demonstrates that DAPE enhances model performances in terms of trained length and length generalization, where the improvements are statistically significant.
Researcher Affiliation Collaboration Chuanyang Zheng1 Yihang Gao2 Han Shi3 Minbin Huang1 Jingyao Li1 Jing Xiong4 Xiaozhe Ren3 Michael Ng5 Xin Jiang3 Zhenguo Li3 Yu Li1 1CUHK 2NUS 3Noah s Ark Lab 4HKU 5HKBU
Pseudocode No Appendix J provides a full PyTorch implementation code block, not pseudocode.
Open Source Code Yes We have made our code publicly available to other researchers in the field. This initiative aims to facilitate a standardized comparison and evaluation of their respective methods, thereby advancing the collective understanding of model performance in relation to perplexity calculations. In this section, we present the implementation of the proposed DAPE module in Py Torch [49]. (Appendix J) [...] Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We have provided the pytorch implementation code in Appendix Section J.
Open Datasets Yes Our analysis involves training language models on the Arxiv and Books3 datasets, which are frequently used benchmarks for evaluating model performance [52, 13, 41, 24].
Dataset Splits Yes Our analysis involves training language models on the Arxiv and Books3 datasets, which are frequently used benchmarks for evaluating model performance [52, 13, 41, 24].
Hardware Specification Yes All experiments are conducted on 8 x A800 GPUs.
Software Dependencies No No specific version numbers for software dependencies (e.g., 'PyTorch 1.9') were found. Appendix J mentions 'PyTorch [49]' but without a version.
Experiment Setup Yes Table 3: Model Configurations. Training sequence length 512 512 Batch size 32 8 32 8 Numer of iterations 50k 50k Dropout prob. 0.0 0.0 Attention dropout prob. 0.0 0.0 Attention head 12 16 Feature dimension 768 1024 Layer number 12 24 Optimizer Adam Adam Optimizer parameter betas [0.9, 0.95] [0.9, 0.95] Learning rate 6e 4 3e 4 Precision float16 float16