Exploring Transformer Extrapolation

Authors: Zhen Qin, Yiran Zhong, Hui Deng

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted on the Wikitext-103, Books, Github, and Wiki Book datasets to demonstrate the viability of our discovered conditions.
Researcher Affiliation Collaboration 1Open NLPLab, Shanghai AI Lab, Shanghai, China 2Tap Tap, Shanghai, China 3Northwestern Polytechnical University, Shaanxi, China
Pseudocode No The paper presents mathematical proofs and equations (e.g., Eq. 1 - 28) but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Code is released at: https://github.com/Open NLPLab/Rpe.
Open Datasets Yes We conduct experiments on Wikitext-103 (Merity et al. 2016), Books (Zhu et al. 2015), Github (Gao et al. 2020) and Wiki Book (Wettig et al. 2022).
Dataset Splits No The paper mentions 'max training length during training is 512' and evaluates on 'testing PPLs' by scaling inference length, but it does not provide specific train/validation/test splits or a clear description of a validation set's role and size.
Hardware Specification Yes All models are implemented in Fairseq (Ott et al. 2019) and trained on 8 V100 GPUs.
Software Dependencies No The paper states 'All models are implemented in Fairseq (Ott et al. 2019)' but does not provide specific version numbers for Fairseq or any other software dependencies used in the experiments.
Experiment Setup Yes We use the same model architecture and training configuration for all RPE variants to ensure fairness. For Wikitext-103 (Merity et al. 2016), since it is a relatively small dataset, we use a 6-layer transformer decoder structure with an embedding size of 512. For other datasets, in particular, we used a 12-layer transformer decoder structure with an embedding size of 768. The evaluation metric is perplexity (PPL) and the max training length during training is 512. The detailed hyper-parameter settings are listed in Appendix.