reproducibilityindex.ai

Exploring Transformer Extrapolation

Authors: Zhen Qin, Yiran Zhong, Hui Deng

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are conducted on the Wikitext-103, Books, Github, and Wiki Book datasets to demonstrate the viability of our discovered conditions.
Researcher Affiliation	Collaboration	1Open NLPLab, Shanghai AI Lab, Shanghai, China 2Tap Tap, Shanghai, China 3Northwestern Polytechnical University, Shaanxi, China
Pseudocode	No	The paper presents mathematical proofs and equations (e.g., Eq. 1 - 28) but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Code is released at: https://github.com/Open NLPLab/Rpe.
Open Datasets	Yes	We conduct experiments on Wikitext-103 (Merity et al. 2016), Books (Zhu et al. 2015), Github (Gao et al. 2020) and Wiki Book (Wettig et al. 2022).
Dataset Splits	No	The paper mentions 'max training length during training is 512' and evaluates on 'testing PPLs' by scaling inference length, but it does not provide specific train/validation/test splits or a clear description of a validation set's role and size.
Hardware Specification	Yes	All models are implemented in Fairseq (Ott et al. 2019) and trained on 8 V100 GPUs.
Software Dependencies	No	The paper states 'All models are implemented in Fairseq (Ott et al. 2019)' but does not provide specific version numbers for Fairseq or any other software dependencies used in the experiments.
Experiment Setup	Yes	We use the same model architecture and training configuration for all RPE variants to ensure fairness. For Wikitext-103 (Merity et al. 2016), since it is a relatively small dataset, we use a 6-layer transformer decoder structure with an embedding size of 512. For other datasets, in particular, we used a 12-layer transformer decoder structure with an embedding size of 768. The evaluation metric is perplexity (PPL) and the max training length during training is 512. The detailed hyper-parameter settings are listed in Appendix.