On Scalar Embedding of Relative Positions in Attention Models

Authors: Junshuang Wu, Richong Zhang, Yongyi Mao, Junfan Chen14050-14057

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical studies show that the AT5 achieves superior performance than the T5 s SRPE. We evaluate the proposed AT5 model on some artificial tasks, text classification, question answering, and machine translation.
Researcher Affiliation Academia Junshuang Wu,1,2 Richong Zhang,1,2 Yongyi Mao,3 Junfan Chen1,2 1Beijing Advanced Institution for Big Data and Brain Computing, Beihang University, Beijing, China 2SKLSDE, School of Computer Science and Engineering, Beihang University, Beijing, China 3School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Canada
Pseudocode No The paper contains diagrams and descriptions of functions, but it does not include structured pseudocode or algorithm blocks.
Open Source Code Yes The source code and the Appendix material are now available at https://github.com/wujs Act/ Scalar-Embedding-of-Relative-Positions.
Open Datasets Yes We evaluate the proposed AT5 model on six real-world text classification datasets, including MR, SUBJ, CR, MPQA, SST, and TREC. The word-embedding is initialized using Glove (Pennington, Socher, and Manning 2014). Given a passage and a query, the SQu AD question answering task s goal is to find the answer in the passage. The total number of examples in the dataset is 107.7k. We utilize the WMT 2017 English-German (en2de) corpus as the training set and the newstest2014 dataset as the validation dataset in the machine translation (MT) task.
Dataset Splits Yes For other datasets [in text classification], the results of the 10-fold cross-validation are reported. we extract 10.1k examples as the validation set, another 10.1k examples as the test set, and the left 87.5k as the training dataset. We utilize the WMT 2017 English-German (en2de) corpus as the training set and the newstest2014 dataset as the validation dataset in the machine translation (MT) task.
Hardware Specification Yes We implement all models using Tensorflow and run the experiments on NVIDIA V100 8GB GPU or NVIDIA V100 32GB GPU.
Software Dependencies No The paper states 'We implement all models using Tensorflow' but does not provide specific version numbers for TensorFlow or any other software dependencies.
Experiment Setup Yes The dimension of the word embedding and the hidden layers are 256 and 512. The number of heads in the self-attention module is 8. The learning rate is chosen from the set {1e 4, 5e 4}. The number of layers in all Transformer models is set to 1. The numbers of training samples are 5000 for Process-50 and 1000 for Reber and Adding-100. The number of testing samples is 5000 for all tasks.