On Scalar Embedding of Relative Positions in Attention Models
Authors: Junshuang Wu, Richong Zhang, Yongyi Mao, Junfan Chen14050-14057
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical studies show that the AT5 achieves superior performance than the T5 s SRPE. We evaluate the proposed AT5 model on some artificial tasks, text classification, question answering, and machine translation. |
| Researcher Affiliation | Academia | Junshuang Wu,1,2 Richong Zhang,1,2 Yongyi Mao,3 Junfan Chen1,2 1Beijing Advanced Institution for Big Data and Brain Computing, Beihang University, Beijing, China 2SKLSDE, School of Computer Science and Engineering, Beihang University, Beijing, China 3School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Canada |
| Pseudocode | No | The paper contains diagrams and descriptions of functions, but it does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code and the Appendix material are now available at https://github.com/wujs Act/ Scalar-Embedding-of-Relative-Positions. |
| Open Datasets | Yes | We evaluate the proposed AT5 model on six real-world text classification datasets, including MR, SUBJ, CR, MPQA, SST, and TREC. The word-embedding is initialized using Glove (Pennington, Socher, and Manning 2014). Given a passage and a query, the SQu AD question answering task s goal is to find the answer in the passage. The total number of examples in the dataset is 107.7k. We utilize the WMT 2017 English-German (en2de) corpus as the training set and the newstest2014 dataset as the validation dataset in the machine translation (MT) task. |
| Dataset Splits | Yes | For other datasets [in text classification], the results of the 10-fold cross-validation are reported. we extract 10.1k examples as the validation set, another 10.1k examples as the test set, and the left 87.5k as the training dataset. We utilize the WMT 2017 English-German (en2de) corpus as the training set and the newstest2014 dataset as the validation dataset in the machine translation (MT) task. |
| Hardware Specification | Yes | We implement all models using Tensorflow and run the experiments on NVIDIA V100 8GB GPU or NVIDIA V100 32GB GPU. |
| Software Dependencies | No | The paper states 'We implement all models using Tensorflow' but does not provide specific version numbers for TensorFlow or any other software dependencies. |
| Experiment Setup | Yes | The dimension of the word embedding and the hidden layers are 256 and 512. The number of heads in the self-attention module is 8. The learning rate is chosen from the set {1e 4, 5e 4}. The number of layers in all Transformer models is set to 1. The numbers of training samples are 5000 for Process-50 and 1000 for Reber and Adding-100. The number of testing samples is 5000 for all tasks. |